Re: CQL data type compatibility between ascii and text
Thanks Yoshi! That explains it a lot :) On Fri, 10 Aug 2018 18:30:25 +1000 Y K wrote Hi Thira, First, there's the 3.0 branch of versions and the 3.x branch of versions. http://cassandra.apache.org/doc/latest/development/patches.html#choosing-the-right-branches-to-work-on The 3.0.16 belongs to the 3.0 branch. The 3.9 and 3.11.2 belong to the 3.x. branch I believe the change was made by this: Remove alter type support https://issues.apache.org/jira/browse/CASSANDRA-12443 where it was "Fixed" in versions 3.0.11 in the 3.0 branch and in version 3.10 in 3.x branch. So 3.0.16 has the fix, 3.9 doesn't have it, but 3.11.2 has it. Best regards, Yoshi 2018年8月10日(金) 17:10 thiranjith : Hi, According to documentation at https://docs.datastax.com/en/cql/3.3/cql/cql_reference/cql_data_types_c.html#cql_data_types_c__cql_data_type_compatibility we should not be able to change the column type from ascii to text. I have had a mix experience with conversion between data types on different versions of Cassandra. For example, given the following table definition: CREATE TABLE changelog ( sequence int, description ascii, createdby ascii, executedon timestamp, PRIMARY KEY (sequence, description) ) Attempting change the data type for column 'createdby' with following CQL alter table changelog alter createdby TYPE text; gives the behaviour outlined below depending on the version of Cassandra: With [cqlsh 5.0.1 | Cassandra 3.0.16 | CQL spec 3.4.0 | Native protocol v4] InvalidRequest: Error from server: code=2200 [Invalid query] message="Altering of types is not allowed" (Expected, per documentation) With [cqlsh 5.0.1 | Cassandra 3.9 | CQL spec 3.4.2 | Native protocol v4] Query succeeds and change the column type to 'text' (as verified by running describe changelog and also inserting data with non-ascii chars into the column) With Cassandra 3.11.2 InvalidRequest: Error from server: code=2200 [Invalid query] message="Altering of types is not allowed" (Expected, per documentation) Can anyone please explain why it works on 3.9 and not on others? Thanks! Thira
Re: CQL data type compatibility between ascii and text
Hi Thira, First, there's the 3.0 branch of versions and the 3.x branch of versions. http://cassandra.apache.org/doc/latest/development/patches.html#choosing-the-right-branches-to-work-on The 3.0.16 belongs to the 3.0 branch. The 3.9 and 3.11.2 belong to the 3.x. branch I believe the change was made by this: Remove alter type support https://issues.apache.org/jira/browse/CASSANDRA-12443 where it was "Fixed" in versions 3.0.11 in the 3.0 branch and in version 3.10 in 3.x branch. So 3.0.16 has the fix, 3.9 doesn't have it, but 3.11.2 has it. Best regards, Yoshi 2018年8月10日(金) 17:10 thiranjith : > Hi, > > According to documentation at > https://docs.datastax.com/en/cql/3.3/cql/cql_reference/cql_data_types_c.html#cql_data_types_c__cql_data_type_compatibility > we > should not be able to change the column type from ascii to text. > > I have had a mix experience with conversion between data types on > different versions of Cassandra. > > For example, given the following table definition: > > > *CREATE TABLE changelog (* > > *sequence int, * > > *description ascii,* > > *createdby ascii,* > > *executedon timestamp,* > > *PRIMARY KEY (sequence, description)* > *)* > > Attempting change the data type for column 'createdby' with following CQL > *alter table changelog alter createdby TYPE text;* > > gives the behaviour outlined below depending on the version of Cassandra: > > >- With [cqlsh 5.0.1 | Cassandra 3.0.16 | CQL spec 3.4.0 | Native >protocol v4] >- InvalidRequest: Error from server: code=2200 [Invalid query] > message="Altering of types is not allowed" (Expected, per documentation) > - With [cqlsh 5.0.1 | Cassandra 3.9 | CQL spec 3.4.2 | Native >protocol v4] >- Query succeeds and change the column type to 'text' (as verified by > running describe changelog and also inserting data with non-ascii chars > into the column) > - With Cassandra 3.11.2 >- InvalidRequest: Error from server: code=2200 [Invalid query] > message="Altering of types is not allowed" (Expected, per documentation) > > Can anyone please explain why it works on 3.9 and not on others? > > Thanks! > Thira > > >
CQL data type compatibility between ascii and text
Hi, According to documentation at https://docs.datastax.com/en/cql/3.3/cql/cql_reference/cql_data_types_c.html#cql_data_types_c__cql_data_type_compatibility we should not be able to change the column type from ascii to text. I have had a mix experience with conversion between data types on different versions of Cassandra. For example, given the following table definition: CREATE TABLE changelog ( sequence int, description ascii, createdby ascii, executedon timestamp, PRIMARY KEY (sequence, description) ) Attempting change the data type for column 'createdby' with following CQL alter table changelog alter createdby TYPE text; gives the behaviour outlined below depending on the version of Cassandra: With [cqlsh 5.0.1 | Cassandra 3.0.16 | CQL spec 3.4.0 | Native protocol v4] InvalidRequest: Error from server: code=2200 [Invalid query] message="Altering of types is not allowed" (Expected, per documentation) With [cqlsh 5.0.1 | Cassandra 3.9 | CQL spec 3.4.2 | Native protocol v4] Query succeeds and change the column type to 'text' (as verified by running describe changelog and also inserting data with non-ascii chars into the column) With Cassandra 3.11.2 InvalidRequest: Error from server: code=2200 [Invalid query] message="Altering of types is not allowed" (Expected, per documentation) Can anyone please explain why it works on 3.9 and not on others? Thanks! Thira
Re: [EXTERNAL] full text search on some text columns
Does someone know if you can do online upgrade of elassandra? With lucene plugin you cannot really because you need to drop and recreate indexes if lucene has been updated. Hannu > Octavian Rinciog kirjoitti 1.8.2018 kello 12.49: > > Hello! > > Maybe this will work? https://github.com/strapdata/elassandra (I haven't > tested this plugin) > > 2018-08-01 12:17 GMT+03:00 Hannu Kröger : >> 3.11.1 plugin works with 3.11.2. But yes, original maintainer is not >> maintaining the project anymore. At least not actively. >> >> Hannu >> >>> Ben Slater kirjoitti 1.8.2018 kello 7.16: >>> >>> We (Instaclustr) will be submitting a PR for 3.11.3 support for >>> cassandra-lucene-index once 3.11.3 is officially released as we offer it as >>> part of our service and have customers using it. >>> >>> Cheers >>> Ben >>> >>>> On Wed, 1 Aug 2018 at 14:06 onmstester onmstester >>>> wrote: >>>> It seems to be an interesting project but sort of abandoned. No update in >>>> last 8 Months and not supporting Cassandra 3.11.2 (the version i >>>> currently use) >>>> >>>> Sent using Zoho Mail >>>> >>>> >>>> >>>> Forwarded message >>>> From : Andrzej Śliwiński >>>> To : >>>> Date : Wed, 01 Aug 2018 08:16:06 +0430 >>>> Subject : Re: [EXTERNAL] full text search on some text columns >>>> Forwarded message >>>> >>>> Maybe this plugin could do the job: >>>> https://github.com/Stratio/cassandra-lucene-index >>>> >>>> On Tue, 31 Jul 2018 at 22:37, onmstester onmstester >>>> wrote: >>>> >>>> >>> -- >>> Ben Slater >>> Chief Product Officer >>> >>> >>> Read our latest technical blog posts here. >>> This email has been sent on behalf of Instaclustr Pty. Limited (Australia) >>> and Instaclustr Inc (USA). >>> This email and any attachments may contain confidential and legally >>> privileged information. If you are not the intended recipient, do not copy >>> or disclose its content, but please reply to this email immediately and >>> highlight the error to the sender and then immediately delete the message. > > > > -- > Octavian Rinciog
Re: [EXTERNAL] full text search on some text columns
Hello! Maybe this will work? https://github.com/strapdata/elassandra (I haven't tested this plugin) 2018-08-01 12:17 GMT+03:00 Hannu Kröger : > 3.11.1 plugin works with 3.11.2. But yes, original maintainer is not > maintaining the project anymore. At least not actively. > > Hannu > > Ben Slater kirjoitti 1.8.2018 kello 7.16: > > We (Instaclustr) will be submitting a PR for 3.11.3 support for > cassandra-lucene-index once 3.11.3 is officially released as we offer it as > part of our service and have customers using it. > > Cheers > Ben > > On Wed, 1 Aug 2018 at 14:06 onmstester onmstester > wrote: > >> It seems to be an interesting project but sort of abandoned. No update in >> last 8 Months and not supporting Cassandra 3.11.2 (the version i currently >> use) >> >> Sent using Zoho Mail <https://www.zoho.com/mail/> >> >> >> Forwarded message >> From : Andrzej Śliwiński >> To : >> Date : Wed, 01 Aug 2018 08:16:06 +0430 >> Subject : Re: [EXTERNAL] full text search on some text columns >> Forwarded message >> >> Maybe this plugin could do the job: https://github.com/ >> Stratio/cassandra-lucene-index >> >> On Tue, 31 Jul 2018 at 22:37, onmstester onmstester >> wrote: >> >> >> -- > > > *Ben Slater* > > *Chief Product Officer <https://www.instaclustr.com/>* > > <https://www.facebook.com/instaclustr> <https://twitter.com/instaclustr> ><https://www.linkedin.com/company/instaclustr> > > Read our latest technical blog posts here > <https://www.instaclustr.com/blog/>. > > This email has been sent on behalf of Instaclustr Pty. Limited (Australia) > and Instaclustr Inc (USA). > > This email and any attachments may contain confidential and legally > privileged information. If you are not the intended recipient, do not copy > or disclose its content, but please reply to this email immediately and > highlight the error to the sender and then immediately delete the message. > > -- Octavian Rinciog
Re: [EXTERNAL] full text search on some text columns
3.11.1 plugin works with 3.11.2. But yes, original maintainer is not maintaining the project anymore. At least not actively. Hannu > Ben Slater kirjoitti 1.8.2018 kello 7.16: > > We (Instaclustr) will be submitting a PR for 3.11.3 support for > cassandra-lucene-index once 3.11.3 is officially released as we offer it as > part of our service and have customers using it. > > Cheers > Ben > >> On Wed, 1 Aug 2018 at 14:06 onmstester onmstester >> wrote: >> It seems to be an interesting project but sort of abandoned. No update in >> last 8 Months and not supporting Cassandra 3.11.2 (the version i currently >> use) >> >> Sent using Zoho Mail >> >> >> >> Forwarded message >> From : Andrzej Śliwiński >> To : >> Date : Wed, 01 Aug 2018 08:16:06 +0430 >> Subject : Re: [EXTERNAL] full text search on some text columns >> Forwarded message >> >> Maybe this plugin could do the job: >> https://github.com/Stratio/cassandra-lucene-index >> >> On Tue, 31 Jul 2018 at 22:37, onmstester onmstester >> wrote: >> >> > -- > Ben Slater > Chief Product Officer > > > Read our latest technical blog posts here. > This email has been sent on behalf of Instaclustr Pty. Limited (Australia) > and Instaclustr Inc (USA). > This email and any attachments may contain confidential and legally > privileged information. If you are not the intended recipient, do not copy > or disclose its content, but please reply to this email immediately and > highlight the error to the sender and then immediately delete the message.
Re: Re: [EXTERNAL] full text search on some text columns
We (Instaclustr) will be submitting a PR for 3.11.3 support for cassandra-lucene-index once 3.11.3 is officially released as we offer it as part of our service and have customers using it. Cheers Ben On Wed, 1 Aug 2018 at 14:06 onmstester onmstester wrote: > It seems to be an interesting project but sort of abandoned. No update in > last 8 Months and not supporting Cassandra 3.11.2 (the version i currently > use) > > Sent using Zoho Mail <https://www.zoho.com/mail/> > > > Forwarded message > From : Andrzej Śliwiński > To : > Date : Wed, 01 Aug 2018 08:16:06 +0430 > Subject : Re: [EXTERNAL] full text search on some text columns > Forwarded message > > Maybe this plugin could do the job: > https://github.com/Stratio/cassandra-lucene-index > > On Tue, 31 Jul 2018 at 22:37, onmstester onmstester > wrote: > > > -- *Ben Slater* *Chief Product Officer <https://www.instaclustr.com/>* <https://www.facebook.com/instaclustr> <https://twitter.com/instaclustr> <https://www.linkedin.com/company/instaclustr> Read our latest technical blog posts here <https://www.instaclustr.com/blog/>. This email has been sent on behalf of Instaclustr Pty. Limited (Australia) and Instaclustr Inc (USA). This email and any attachments may contain confidential and legally privileged information. If you are not the intended recipient, do not copy or disclose its content, but please reply to this email immediately and highlight the error to the sender and then immediately delete the message.
Fwd: Re: [EXTERNAL] full text search on some text columns
It seems to be an interesting project but sort of abandoned. No update in last 8 Months and not supporting Cassandra 3.11.2 (the version i currently use) Sent using Zoho Mail Forwarded message From : Andrzej Śliwiński To : Date : Wed, 01 Aug 2018 08:16:06 +0430 Subject : Re: [EXTERNAL] full text search on some text columns Forwarded message Maybe this plugin could do the job: https://github.com/Stratio/cassandra-lucene-index On Tue, 31 Jul 2018 at 22:37, onmstester onmstester wrote:
Re: [EXTERNAL] full text search on some text columns
Maybe this plugin could do the job: https://github.com/Stratio/cassandra-lucene-index On Tue, 31 Jul 2018 at 22:37, onmstester onmstester wrote: > Actually we can't afford buying DataStax Search > > Sent using Zoho Mail <https://www.zoho.com/mail/> > > > On Tue, 31 Jul 2018 19:38:28 +0430 *Durity, Sean R > >* wrote > > That sounds like a problem tailor-made for the DataStax Search (embedded > SOLR) solution. I think that would be the fastest path to success. > > > > > > Sean Durity > > > > *From:* onmstester onmstester > *Sent:* Tuesday, July 31, 2018 10:46 AM > *To:* user > *Subject:* [EXTERNAL] full text search on some text columns > > > > I need to do a full text search (like) on one of my clustering keys and > one of partition keys (it use text as data type). The input rate is high so > only Cassandra could handle it, Is there any open source version project > which help using cassandra+ solr or cassandra + elastic? > > Any Recommendation on doing this with home-made solutions would be > appreciated? > > > > Sent using Zoho Mail > <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.zoho.com_mail_=DwMCaQ=MtgQEAMQGqekjTjiAhkudQ=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ=UBaju60JDLVhH7F7spCpDXFDBHek3FhzpQwQx9rupDQ=CAZxnYcVYthtRXA0895OqcIfD97V7N8xU_QGW2c4zRw=> > > > > > > -- > > The information in this Internet Email is confidential and may be legally > privileged. It is intended solely for the addressee. Access to this Email > by anyone else is unauthorized. If you are not the intended recipient, any > disclosure, copying, distribution or any action taken or omitted to be > taken in reliance on it, is prohibited and may be unlawful. When addressed > to our clients any opinions or advice contained in this Email are subject > to the terms and conditions expressed in any applicable governing The Home > Depot terms of business or client engagement letter. The Home Depot > disclaims all responsibility and liability for the accuracy and content of > this attachment and for any damages or losses arising from any > inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other > items of a destructive nature, which may be contained in this attachment > and shall not be liable for direct, indirect, consequential or special > damages in connection with this e-mail message or its attachment. > > > >
RE: [EXTERNAL] full text search on some text columns
Actually we can't afford buying DataStax Search Sent using Zoho Mail On Tue, 31 Jul 2018 19:38:28 +0430 Durity, Sean R wrote That sounds like a problem tailor-made for the DataStax Search (embedded SOLR) solution. I think that would be the fastest path to success. Sean Durity From: onmstester onmstester Sent: Tuesday, July 31, 2018 10:46 AM To: user Subject: [EXTERNAL] full text search on some text columns I need to do a full text search (like) on one of my clustering keys and one of partition keys (it use text as data type). The input rate is high so only Cassandra could handle it, Is there any open source version project which help using cassandra+ solr or cassandra + elastic? Any Recommendation on doing this with home-made solutions would be appreciated? Sent using Zoho Mail The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.
Re: full text search on some text columns
Thanks Jordan, There would be millions of rows per day, is SASI capable of standing such a rate? Sent using Zoho Mail On Tue, 31 Jul 2018 19:47:55 +0430 Jordan West wrote On Tue, Jul 31, 2018 at 7:45 AM, onmstester onmstester wrote: I need to do a full text search (like) on one of my clustering keys and one of partition keys (it use text as data type). For simple LIKE queries on existing columns you could give SASI (https://docs.datastax.com/en/dse/5.1/cql/cql/cql_using/useSASIIndex.html) a try without having to stand up a separate piece of software. Its relatively new and isn’t as battle tested as other parts of Cassandra but it has been used in production. There are some performance issues with wider-CQL partitions if you have those (https://issues.apache.org/jira/browse/CASSANDRA-11990). I hope to address that for 4.0, time permitted. Full disclosure, I was one of the original SASI authors. The input rate is high so only Cassandra could handle it, Is there any open source version project which help using cassandra+ solr or cassandra + elastic? Any Recommendation on doing this with home-made solutions would be appreciated? Sent using Zoho Mail
Re: full text search on some text columns
I had SASI in mind before stopping myself from replying to this thread. Actually the OP needs to index clustering column and partition key, and as far as I remember, I've myself opened a JIRA and pushed a patch for SASI to support indexing composite partition key but there are some issues so far preventing this to be merged into trunk https://issues.apache.org/jira/browse/CASSANDRA-11734 https://issues.apache.org/jira/browse/CASSANDRA-13228 On Tue, Jul 31, 2018 at 5:17 PM, Jordan West wrote: > > > On Tue, Jul 31, 2018 at 7:45 AM, onmstester onmstester < > onmstes...@zoho.com> wrote: > >> I need to do a full text search (like) on one of my clustering keys and >> one of partition keys (it use text as data type). >> > > For simple LIKE queries on existing columns you could give SASI ( > https://docs.datastax.com/en/dse/5.1/cql/cql/cql_using/useSASIIndex.html) > a try without having to stand up a separate piece of software. Its > relatively new and isn’t as battle tested as other parts of Cassandra but > it has been used in production. There are some performance issues with > wider-CQL partitions if you have those (https://issues.apache.org/ > jira/browse/CASSANDRA-11990). I hope to address that for 4.0, time > permitted. > > Full disclosure, I was one of the original SASI authors. > > >> The input rate is high so only Cassandra could handle it, Is there any >> open source version project which help using cassandra+ solr or cassandra + >> elastic? >> Any Recommendation on doing this with home-made solutions would be >> appreciated? >> >> Sent using Zoho Mail <https://www.zoho.com/mail/> >> >> > > >
Re: full text search on some text columns
On Tue, Jul 31, 2018 at 7:45 AM, onmstester onmstester wrote: > I need to do a full text search (like) on one of my clustering keys and > one of partition keys (it use text as data type). > For simple LIKE queries on existing columns you could give SASI ( https://docs.datastax.com/en/dse/5.1/cql/cql/cql_using/useSASIIndex.html) a try without having to stand up a separate piece of software. Its relatively new and isn’t as battle tested as other parts of Cassandra but it has been used in production. There are some performance issues with wider-CQL partitions if you have those ( https://issues.apache.org/jira/browse/CASSANDRA-11990). I hope to address that for 4.0, time permitted. Full disclosure, I was one of the original SASI authors. > The input rate is high so only Cassandra could handle it, Is there any > open source version project which help using cassandra+ solr or cassandra + > elastic? > Any Recommendation on doing this with home-made solutions would be > appreciated? > > Sent using Zoho Mail <https://www.zoho.com/mail/> > >
RE: [EXTERNAL] full text search on some text columns
That sounds like a problem tailor-made for the DataStax Search (embedded SOLR) solution. I think that would be the fastest path to success. Sean Durity From: onmstester onmstester Sent: Tuesday, July 31, 2018 10:46 AM To: user Subject: [EXTERNAL] full text search on some text columns I need to do a full text search (like) on one of my clustering keys and one of partition keys (it use text as data type). The input rate is high so only Cassandra could handle it, Is there any open source version project which help using cassandra+ solr or cassandra + elastic? Any Recommendation on doing this with home-made solutions would be appreciated? Sent using Zoho Mail<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.zoho.com_mail_=DwMCaQ=MtgQEAMQGqekjTjiAhkudQ=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ=UBaju60JDLVhH7F7spCpDXFDBHek3FhzpQwQx9rupDQ=CAZxnYcVYthtRXA0895OqcIfD97V7N8xU_QGW2c4zRw=> The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.
full text search on some text columns
I need to do a full text search (like) on one of my clustering keys and one of partition keys (it use text as data type). The input rate is high so only Cassandra could handle it, Is there any open source version project which help using cassandra+ solr or cassandra + elastic? Any Recommendation on doing this with home-made solutions would be appreciated? Sent using Zoho Mail
Re: Text or....
Depending on the compression rate, I think it would generate less garbage on the Cassandra side if you compressed it client side. Something to test out. > On Apr 4, 2018, at 7:19 AM, Jeff Jirsa <jji...@gmail.com> wrote: > > Compressing server side and validating checksums is hugely important in the > more frequently used versions of cassandra - so since you probably want to > run compression on the server anyway, I’m not sure why you’d compress it > twice > > -- > Jeff Jirsa > > > On Apr 4, 2018, at 6:23 AM, DuyHai Doan <doanduy...@gmail.com > <mailto:doanduy...@gmail.com>> wrote: > >> Compressing client-side is better because it will save: >> >> 1) a lot of bandwidth on the network >> 2) a lot of Cassandra CPU because no decompression server-side >> 3) a lot of Cassandra HEAP because the compressed blob should be relatively >> small (text data compress very well) compared to the raw size >> >> On Wed, Apr 4, 2018 at 2:59 PM, Jeronimo de A. Barros >> <jeronimo.bar...@gmail.com <mailto:jeronimo.bar...@gmail.com>> wrote: >> Hi, >> >> We use a pseudo file-system table where the chunks are blobs of 64 KB and we >> never had any performance issue. >> >> Primary-key structure is ((file-uuid), chunck-id). >> >> Jero >> >> On Wed, Apr 4, 2018 at 9:25 AM, shalom sagges <shalomsag...@gmail.com >> <mailto:shalomsag...@gmail.com>> wrote: >> Hi All, >> >> A certain application is writing ~55,000 characters for a single row. Most >> of these characters are entered to one column with "text" data type. >> >> This looks insanely large for one row. >> Would you suggest to change the data type from "text" to BLOB or any other >> option that might fit this scenario? >> >> Thanks! >> >>
Re: Text or....
Compressing server side and validating checksums is hugely important in the more frequently used versions of cassandra - so since you probably want to run compression on the server anyway, I’m not sure why you’d compress it twice -- Jeff Jirsa > On Apr 4, 2018, at 6:23 AM, DuyHai Doan <doanduy...@gmail.com> wrote: > > Compressing client-side is better because it will save: > > 1) a lot of bandwidth on the network > 2) a lot of Cassandra CPU because no decompression server-side > 3) a lot of Cassandra HEAP because the compressed blob should be relatively > small (text data compress very well) compared to the raw size > >> On Wed, Apr 4, 2018 at 2:59 PM, Jeronimo de A. Barros >> <jeronimo.bar...@gmail.com> wrote: >> Hi, >> >> We use a pseudo file-system table where the chunks are blobs of 64 KB and we >> never had any performance issue. >> >> Primary-key structure is ((file-uuid), chunck-id). >> >> Jero >> >>> On Wed, Apr 4, 2018 at 9:25 AM, shalom sagges <shalomsag...@gmail.com> >>> wrote: >>> Hi All, >>> >>> A certain application is writing ~55,000 characters for a single row. Most >>> of these characters are entered to one column with "text" data type. >>> >>> This looks insanely large for one row. >>> Would you suggest to change the data type from "text" to BLOB or any other >>> option that might fit this scenario? >>> >>> Thanks! >> >
Re: Text or....
Compressing client-side is better because it will save: 1) a lot of bandwidth on the network 2) a lot of Cassandra CPU because no decompression server-side 3) a lot of Cassandra HEAP because the compressed blob should be relatively small (text data compress very well) compared to the raw size On Wed, Apr 4, 2018 at 2:59 PM, Jeronimo de A. Barros < jeronimo.bar...@gmail.com> wrote: > Hi, > > We use a pseudo file-system table where the chunks are blobs of 64 KB and > we never had any performance issue. > > Primary-key structure is ((file-uuid), chunck-id). > > Jero > > On Wed, Apr 4, 2018 at 9:25 AM, shalom sagges <shalomsag...@gmail.com> > wrote: > >> Hi All, >> >> A certain application is writing ~55,000 characters for a single row. >> Most of these characters are entered to one column with "text" data type. >> >> This looks insanely large for one row. >> Would you suggest to change the data type from "text" to BLOB or any >> other option that might fit this scenario? >> >> Thanks! >> > >
Re: Text or....
Hi, We use a pseudo file-system table where the chunks are blobs of 64 KB and we never had any performance issue. Primary-key structure is ((file-uuid), chunck-id). Jero On Wed, Apr 4, 2018 at 9:25 AM, shalom sagges <shalomsag...@gmail.com> wrote: > Hi All, > > A certain application is writing ~55,000 characters for a single row. Most > of these characters are entered to one column with "text" data type. > > This looks insanely large for one row. > Would you suggest to change the data type from "text" to BLOB or any other > option that might fit this scenario? > > Thanks! >
Re: Text or....
Hi Shalom, You might want to compress on application side before inserting in Cassandra, using the algorithm on your choice, based on compression ratio and speed that you found acceptable with your use case On 4 April 2018 at 14:38, shalom sagges <shalomsag...@gmail.com> wrote: > Thanks DuyHai! > > I'm using the default table compression. Is there anything else I should > look into? > Regarding the table compression, I understand that for write heavy tables, > it's best to keep the default and not compress it further. Have I > understood correctly? > > On Wed, Apr 4, 2018 at 3:28 PM, DuyHai Doan <doanduy...@gmail.com> wrote: > >> Compress it and stores it as a blob. >> Unless you ever need to index it but I guess even with SASI indexing a so >> huge text block is not a good idea >> >> On Wed, Apr 4, 2018 at 2:25 PM, shalom sagges <shalomsag...@gmail.com> >> wrote: >> >>> Hi All, >>> >>> A certain application is writing ~55,000 characters for a single row. >>> Most of these characters are entered to one column with "text" data type. >>> >>> This looks insanely large for one row. >>> Would you suggest to change the data type from "text" to BLOB or any >>> other option that might fit this scenario? >>> >>> Thanks! >>> >> >> >
Re: Text or....
Thanks DuyHai! I'm using the default table compression. Is there anything else I should look into? Regarding the table compression, I understand that for write heavy tables, it's best to keep the default and not compress it further. Have I understood correctly? On Wed, Apr 4, 2018 at 3:28 PM, DuyHai Doan <doanduy...@gmail.com> wrote: > Compress it and stores it as a blob. > Unless you ever need to index it but I guess even with SASI indexing a so > huge text block is not a good idea > > On Wed, Apr 4, 2018 at 2:25 PM, shalom sagges <shalomsag...@gmail.com> > wrote: > >> Hi All, >> >> A certain application is writing ~55,000 characters for a single row. >> Most of these characters are entered to one column with "text" data type. >> >> This looks insanely large for one row. >> Would you suggest to change the data type from "text" to BLOB or any >> other option that might fit this scenario? >> >> Thanks! >> > >
Re: Text or....
Compress it and stores it as a blob. Unless you ever need to index it but I guess even with SASI indexing a so huge text block is not a good idea On Wed, Apr 4, 2018 at 2:25 PM, shalom sagges <shalomsag...@gmail.com> wrote: > Hi All, > > A certain application is writing ~55,000 characters for a single row. Most > of these characters are entered to one column with "text" data type. > > This looks insanely large for one row. > Would you suggest to change the data type from "text" to BLOB or any other > option that might fit this scenario? > > Thanks! >
Text or....
Hi All, A certain application is writing ~55,000 characters for a single row. Most of these characters are entered to one column with "text" data type. This looks insanely large for one row. Would you suggest to change the data type from "text" to BLOB or any other option that might fit this scenario? Thanks!
Re: How to Parse raw CQL text?
Yes ideally. I’ve been spending a bit of time in the parser the last week. There’s a lot of internals which are still using old terminology and are pretty damn confusing. I’m doing a little investigation into exposing some of the information while also modernizing it. > On Feb 26, 2018, at 10:02 AM, Hannu Kröger <hkro...@gmail.com> wrote: > > If this is needed functionality, shouldn’t that be available as a public > method or something? Maybe write a patch etc. ? > > Ariel Weisberg <ar...@weisberg.ws <mailto:ar...@weisberg.ws>> kirjoitti > 26.2.2018 kello 18.47: > >> Hi, >> >> I took a similar approach and it worked fine. I was able to build a tool >> that parsed production query logs. >> >> I used a helper method that would just grab a private field out of an object >> by name using reflection. >> >> Ariel >> >> On Sun, Feb 25, 2018, at 11:58 PM, Jonathan Haddad wrote: >>> I had to do something similar recently. Take a look at >>> org.apache.cassandra.cql3.QueryProcessor.parseStatement(). I've got some >>> sample code here [1] as well as a blog post [2] that explains how to access >>> the private variables, since there's no access provided. It wasn't really >>> designed to be used as a library, so YMMV with future changes. >>> >>> [1] >>> https://github.com/rustyrazorblade/rustyrazorblade-examples/blob/master/privatevaraccess/src/main/kotlin/com/rustyrazorblade/privatevaraccess/CreateTableParser.kt >>> >>> <https://github.com/rustyrazorblade/rustyrazorblade-examples/blob/master/privatevaraccess/src/main/kotlin/com/rustyrazorblade/privatevaraccess/CreateTableParser.kt> >>> [2] >>> http://rustyrazorblade.com/post/2018/2018-02-25-accessing-private-variables-in-jvm/ >>> >>> <http://rustyrazorblade.com/post/2018/2018-02-25-accessing-private-variables-in-jvm/> >>> >>> On Mon, Feb 5, 2018 at 2:27 PM Kant Kodali <k...@peernova.com >>> <mailto:k...@peernova.com>> wrote: >>> I just did some trial and error. Looks like this would work >>> >>> public class Test { >>> >>> >>> >>> public static void main(String[] args) throws Exception { >>> >>> String stmt = "create table if not exists test_keyspace.my_table >>> (field1 text, field2 int, field3 set, field4 map<ascii, text>, >>> primary key (field1) );"; >>> >>> ANTLRStringStream stringStream = new ANTLRStringStream(stmt); >>> >>> CqlLexer cqlLexer = new CqlLexer(stringStream); >>> >>> CommonTokenStream token = new CommonTokenStream(cqlLexer); >>> >>> CqlParser parser = new CqlParser(token); >>> >>> ParsedStatement query = parser.cqlStatement(); >>> >>> >>> if (query.getClass().getDeclaringClass() == >>> CreateTableStatement.class) { >>> >>> CreateTableStatement.RawStatement cts = >>> (CreateTableStatement.RawStatement) query; >>> >>> CFMetaData >>> >>> .compile(stmt, cts.keyspace()) >>> >>> >>> >>> .getColumnMetadata() >>> >>> .values() >>> >>> .stream() >>> >>> .forEach(cd -> System.out.println(cd)); >>> >>> >>> } >>>} >>> } >>> >>> On Mon, Feb 5, 2018 at 2:13 PM, Kant Kodali <k...@peernova.com >>> <mailto:k...@peernova.com>> wrote: >>> Hi Anant, >>> >>> I just have CQL create table statement as a string I want to extract all >>> the parts like, tableName, KeySpaceName, regular Columns, partitionKey, >>> ClusteringKey, Clustering Order and so on. Thats really it! >>> >>> Thanks! >>> >>> On Mon, Feb 5, 2018 at 1:50 PM, Rahul Singh <rahul.xavier.si...@gmail.com >>> <mailto:rahul.xavier.si...@gmail.com>> wrote: >>> I think I understand what you are trying to do … but what is your goal? >>> What do you mean “use it for different” queries… Maybe you want to do an >>> event and have an event processor? Seems like you are trying to basically >>> by pass that pattern and parse a query and split it into several actions? >>> >>> Did you look into this unit test folder? >>&g
Re: How to Parse raw CQL text?
If this is needed functionality, shouldn’t that be available as a public method or something? Maybe write a patch etc. ? > Ariel Weisberg <ar...@weisberg.ws> kirjoitti 26.2.2018 kello 18.47: > > Hi, > > I took a similar approach and it worked fine. I was able to build a tool that > parsed production query logs. > > I used a helper method that would just grab a private field out of an object > by name using reflection. > > Ariel > >> On Sun, Feb 25, 2018, at 11:58 PM, Jonathan Haddad wrote: >> I had to do something similar recently. Take a look at >> org.apache.cassandra.cql3.QueryProcessor.parseStatement(). I've got some >> sample code here [1] as well as a blog post [2] that explains how to access >> the private variables, since there's no access provided. It wasn't really >> designed to be used as a library, so YMMV with future changes. >> >> [1] >> https://github.com/rustyrazorblade/rustyrazorblade-examples/blob/master/privatevaraccess/src/main/kotlin/com/rustyrazorblade/privatevaraccess/CreateTableParser.kt >> [2] >> http://rustyrazorblade.com/post/2018/2018-02-25-accessing-private-variables-in-jvm/ >> >> On Mon, Feb 5, 2018 at 2:27 PM Kant Kodali <k...@peernova.com> wrote: >> I just did some trial and error. Looks like this would work >> >> public class Test { >> >> >> >> public static void main(String[] args) throws Exception { >> >> String stmt = "create table if not exists test_keyspace.my_table >> (field1 text, field2 int, field3 set, field4 map<ascii, text>, >> primary key (field1) );"; >> >> ANTLRStringStream stringStream = new ANTLRStringStream(stmt); >> >> CqlLexer cqlLexer = new CqlLexer(stringStream); >> >> CommonTokenStream token = new CommonTokenStream(cqlLexer); >> >> CqlParser parser = new CqlParser(token); >> >> ParsedStatement query = parser.cqlStatement(); >> >> >> if (query.getClass().getDeclaringClass() == >> CreateTableStatement.class) { >> >> CreateTableStatement.RawStatement cts = >> (CreateTableStatement.RawStatement) query; >> >> CFMetaData >> >> .compile(stmt, cts.keyspace()) >> >> >> >> .getColumnMetadata() >> >> .values() >> >> .stream() >> >> .forEach(cd -> System.out.println(cd)); >> >> >> } >>} >> } >> >> On Mon, Feb 5, 2018 at 2:13 PM, Kant Kodali <k...@peernova.com> wrote: >> Hi Anant, >> >> I just have CQL create table statement as a string I want to extract all the >> parts like, tableName, KeySpaceName, regular Columns, partitionKey, >> ClusteringKey, Clustering Order and so on. Thats really it! >> >> Thanks! >> >> On Mon, Feb 5, 2018 at 1:50 PM, Rahul Singh <rahul.xavier.si...@gmail.com> >> wrote: >> I think I understand what you are trying to do … but what is your goal? What >> do you mean “use it for different” queries… Maybe you want to do an event >> and have an event processor? Seems like you are trying to basically by pass >> that pattern and parse a query and split it into several actions? >> >> Did you look into this unit test folder? >> >> https://github.com/apache/cassandra/blob/trunk/test/unit/org/apache/cassandra/cql3/CQLTester.java >> >> -- >> Rahul Singh >> rahul.si...@anant.us >> >> Anant Corporation >> >> On Feb 5, 2018, 4:06 PM -0500, Kant Kodali <k...@peernova.com>, wrote: >> >>> Hi All, >>> >>> I have a need where I get a raw CQL create table statement as a String and >>> I need to parse the keyspace, tablename, columns and so on..so I can use it >>> for various queries and send it to C*. I used the example below from this >>> link. I get the following error. And I thought maybe someone in this >>> mailing list will be more familiar with internals. >>> >>> Exception in thread "main" >>> org.apache.cassandra.exceptions.ConfigurationException: Keyspace >>> test_keyspace doesn't exist >>> at >>> org.apache.cassandra.cql3.statements.CreateTableStatement$RawStatement.prepare(CreateTableStatement.java:200) >>> at com.hello.world.Test.main(Test.java:23) >>> >>> >>> Here is my code. >>> >>> package com.hello.wo
Re: How to Parse raw CQL text?
wouldn't it make sense to expose the parser at some point? On Mon, Feb 26, 2018 at 9:47 AM, Ariel Weisberg <ar...@weisberg.ws> wrote: > Hi, > > I took a similar approach and it worked fine. I was able to build a tool > that parsed production query logs. > > I used a helper method that would just grab a private field out of an > object by name using reflection. > > Ariel > > On Sun, Feb 25, 2018, at 11:58 PM, Jonathan Haddad wrote: > > I had to do something similar recently. Take a look at > org.apache.cassandra.cql3.QueryProcessor.parseStatement(). I've got some > sample code here [1] as well as a blog post [2] that explains how to access > the private variables, since there's no access provided. It wasn't really > designed to be used as a library, so YMMV with future changes. > > [1] https://github.com/rustyrazorblade/rustyrazorblade-examples/blob/ > master/privatevaraccess/src/main/kotlin/com/rustyrazorblade/ > privatevaraccess/CreateTableParser.kt > [2] http://rustyrazorblade.com/post/2018/2018-02-25- > accessing-private-variables-in-jvm/ > > On Mon, Feb 5, 2018 at 2:27 PM Kant Kodali <k...@peernova.com> wrote: > > I just did some trial and error. Looks like this would work > > *public class *Test { > > *public static void *main(String[] args) *throws *Exception { > > String stmt = *"create table if not exists test_keyspace.my_table > (field1 text, field2 int, field3 set, field4 map<ascii, text>, primary > key (field1) );"*; > ANTLRStringStream stringStream = *new *ANTLRStringStream(stmt); > CqlLexer cqlLexer = *new *CqlLexer(stringStream); > CommonTokenStream token = *new *CommonTokenStream(cqlLexer); > CqlParser parser = *new *CqlParser(token); > > ParsedStatement query = parser.cqlStatement(); > > > *if *(query.getClass().getDeclaringClass() == > CreateTableStatement.*class*) { > CreateTableStatement.RawStatement cts = > (CreateTableStatement.RawStatement) query; > > CFMetaData > .*compile*(stmt, cts.keyspace()) > > > .getColumnMetadata() > .values() > .stream() > .forEach(cd -> System.*out*.println(cd)); > > > } > >} > > } > > > On Mon, Feb 5, 2018 at 2:13 PM, Kant Kodali <k...@peernova.com> wrote: > > Hi Anant, > > I just have CQL create table statement as a string I want to extract all > the parts like, tableName, KeySpaceName, regular Columns, partitionKey, > ClusteringKey, Clustering Order and so on. Thats really it! > > Thanks! > > On Mon, Feb 5, 2018 at 1:50 PM, Rahul Singh <rahul.xavier.si...@gmail.com> > wrote: > > I think I understand what you are trying to do … but what is your goal? > What do you mean “use it for different” queries… Maybe you want to do an > event and have an event processor? Seems like you are trying to basically > by pass that pattern and parse a query and split it into several actions? > > Did you look into this unit test folder? > > https://github.com/apache/cassandra/blob/trunk/test/ > unit/org/apache/cassandra/cql3/CQLTester.java > > -- > Rahul Singh > rahul.si...@anant.us > > Anant Corporation > > On Feb 5, 2018, 4:06 PM -0500, Kant Kodali <k...@peernova.com>, wrote: > > Hi All, > > I have a need where I get a raw CQL create table statement as a String and > I need to parse the keyspace, tablename, columns and so on..so I can use it > for various queries and send it to C*. I used the example below from this > link <https://github.com/tacoo/cassandra-antlr-sample>. I get the > following error. And I thought maybe someone in this mailing list will be > more familiar with internals. > > Exception in thread "main" > org.apache.cassandra.exceptions.ConfigurationException: > Keyspace test_keyspace doesn't exist > at org.apache.cassandra.cql3.statements.CreateTableStatement$ > RawStatement.prepare(CreateTableStatement.java:200) > at com.hello.world.Test.main(Test.java:23) > > > Here is my code. > > *package *com.hello.world; > > *import *org.antlr.runtime.ANTLRStringStream; > *import *org.antlr.runtime.CommonTokenStream; > *import *org.apache.cassandra.cql3.CqlLexer; > *import *org.apache.cassandra.cql3.CqlParser; > *import *org.apache.cassandra.cql3.statements.CreateTableStatement; > *import *org.apache.cassandra.cql3.statements.ParsedStatement; > > *public class *Test { > > *public static void *main(String[] args) *throws *Exception { > String stmt = *"create table if not exists test_keyspace**.my_table
Re: How to Parse raw CQL text?
Hi, I took a similar approach and it worked fine. I was able to build a tool that parsed production query logs. I used a helper method that would just grab a private field out of an object by name using reflection. Ariel On Sun, Feb 25, 2018, at 11:58 PM, Jonathan Haddad wrote: > I had to do something similar recently. Take a look at > org.apache.cassandra.cql3.QueryProcessor.parseStatement(). I've got > some sample code here [1] as well as a blog post [2] that explains how > to access the private variables, since there's no access provided. It > wasn't really designed to be used as a library, so YMMV with future > changes.> > [1] > https://github.com/rustyrazorblade/rustyrazorblade-examples/blob/master/privatevaraccess/src/main/kotlin/com/rustyrazorblade/privatevaraccess/CreateTableParser.kt> > [2] > http://rustyrazorblade.com/post/2018/2018-02-25-accessing-private-variables-in-jvm/> > > On Mon, Feb 5, 2018 at 2:27 PM Kant Kodali <k...@peernova.com> wrote:>> I > just did some trial and error. Looks like this would work >> >> *public class *Test { >> >> >> >> *public static void *main(String[] args) *throws *Exception { >>>> String stmt = *"create table if not exists >> test_keyspace.my_table (field1 text, field2 int, field3 >> set, field4 map<ascii, text>, primary key (field1) >> );"*; >>>> ANTLRStringStream stringStream = *new >> *ANTLRStringStream(stmt); >>>> CqlLexer cqlLexer = *new *CqlLexer(stringStream); >> >> CommonTokenStream token = *new *CommonTokenStream(cqlLexer); >>>> CqlParser parser = *new *CqlParser(token); >> >> ParsedStatement query = parser.cqlStatement(); >> >> >> *if *(query.getClass().getDeclaringClass() == >> CreateTableStatement.*class*) { >>>> CreateTableStatement.RawStatement cts = >> (CreateTableStatement.RawStatement) query; >>>> CFMetaData >> >> .*compile*(stmt, cts.keyspace()) >> >> >> >> .getColumnMetadata() >> >> .values() >> >> .stream() >> >> .forEach(cd -> System.**out**.println(cd)); >> >> >> } >>} >> } >> >> On Mon, Feb 5, 2018 at 2:13 PM, Kant Kodali >> <k...@peernova.com> wrote:>>> Hi Anant, >>> >>> I just have CQL create table statement as a string I want to extract >>> all the parts like, tableName, KeySpaceName, regular Columns, >>> partitionKey, ClusteringKey, Clustering Order and so on. Thats >>> really it!>>> >>> Thanks! >>> >>> On Mon, Feb 5, 2018 at 1:50 PM, Rahul Singh >>> <rahul.xavier.si...@gmail.com> wrote:>>>> I think I understand what you are >>> trying to do … but what is your >>>> goal? What do you mean “use it for different” queries… Maybe you >>>> want to do an event and have an event processor? Seems like you are >>>> trying to basically by pass that pattern and parse a query and >>>> split it into several actions?>>>> >>>> Did you look into this unit test folder? >>>> >>>> https://github.com/apache/cassandra/blob/trunk/test/unit/org/apache/cassandra/cql3/CQLTester.java>>>> >>>> >>>> -- >>>> Rahul Singh >>>> rahul.si...@anant.us >>>> >>>> Anant Corporation >>>> >>>> On Feb 5, 2018, 4:06 PM -0500, Kant Kodali <k...@peernova.com>, >>>> wrote:>>>> >>>>> Hi All, >>>>> >>>>> I have a need where I get a raw CQL create table statement as a >>>>> String and I need to parse the keyspace, tablename, columns and so >>>>> on..so I can use it for various queries and send it to C*. I used >>>>> the example below from this link[1]. I get the following error. >>>>> And I thought maybe someone in this mailing list will be more >>>>> familiar with internals.>>>>> >>>>> Exception in thread "main" >>>>> org.apache.cassandra.exceptions.ConfigurationException: Keyspace >>>>> test_keyspace doesn't exist>>>>> at >>>>> org.apache.cassandra.cql3.statements.CreateTableStatement$RawS- >>>>> tatement.prepare(Cr
Re: How to Parse raw CQL text?
I had to do something similar recently. Take a look at org.apache.cassandra.cql3.QueryProcessor.parseStatement(). I've got some sample code here [1] as well as a blog post [2] that explains how to access the private variables, since there's no access provided. It wasn't really designed to be used as a library, so YMMV with future changes. [1] https://github.com/rustyrazorblade/rustyrazorblade-examples/blob/master/privatevaraccess/src/main/kotlin/com/rustyrazorblade/privatevaraccess/CreateTableParser.kt [2] http://rustyrazorblade.com/post/2018/2018-02-25-accessing-private-variables-in-jvm/ On Mon, Feb 5, 2018 at 2:27 PM Kant Kodali <k...@peernova.com> wrote: > I just did some trial and error. Looks like this would work > > public class Test { > > public static void main(String[] args) throws Exception { > > String stmt = "create table if not exists test_keyspace.my_table > (field1 text, field2 int, field3 set, field4 map<ascii, text>, primary > key (field1) );"; > ANTLRStringStream stringStream = new ANTLRStringStream(stmt); > CqlLexer cqlLexer = new CqlLexer(stringStream); > CommonTokenStream token = new CommonTokenStream(cqlLexer); > CqlParser parser = new CqlParser(token); > > ParsedStatement query = parser.cqlStatement(); > > > if (query.getClass().getDeclaringClass() == > CreateTableStatement.class) { > CreateTableStatement.RawStatement cts = > (CreateTableStatement.RawStatement) query; > > CFMetaData > .compile(stmt, cts.keyspace()) > > > .getColumnMetadata() > .values() > .stream() > .forEach(cd -> System.out.println(cd)); > > } > >} > > } > > > On Mon, Feb 5, 2018 at 2:13 PM, Kant Kodali <k...@peernova.com> wrote: > >> Hi Anant, >> >> I just have CQL create table statement as a string I want to extract all >> the parts like, tableName, KeySpaceName, regular Columns, partitionKey, >> ClusteringKey, Clustering Order and so on. Thats really it! >> >> Thanks! >> >> On Mon, Feb 5, 2018 at 1:50 PM, Rahul Singh <rahul.xavier.si...@gmail.com >> > wrote: >> >>> I think I understand what you are trying to do … but what is your goal? >>> What do you mean “use it for different” queries… Maybe you want to do an >>> event and have an event processor? Seems like you are trying to basically >>> by pass that pattern and parse a query and split it into several actions? >>> >>> Did you look into this unit test folder? >>> >>> >>> https://github.com/apache/cassandra/blob/trunk/test/unit/org/apache/cassandra/cql3/CQLTester.java >>> >>> -- >>> Rahul Singh >>> rahul.si...@anant.us >>> >>> Anant Corporation >>> >>> On Feb 5, 2018, 4:06 PM -0500, Kant Kodali <k...@peernova.com>, wrote: >>> >>> Hi All, >>> >>> I have a need where I get a raw CQL create table statement as a String >>> and I need to parse the keyspace, tablename, columns and so on..so I can >>> use it for various queries and send it to C*. I used the example below >>> from this link <https://github.com/tacoo/cassandra-antlr-sample>. I get >>> the following error. And I thought maybe someone in this mailing list will >>> be more familiar with internals. >>> >>> Exception in thread "main" >>> org.apache.cassandra.exceptions.ConfigurationException: Keyspace >>> test_keyspace doesn't exist >>> at >>> org.apache.cassandra.cql3.statements.CreateTableStatement$RawStatement.prepare(CreateTableStatement.java:200) >>> at com.hello.world.Test.main(Test.java:23) >>> >>> >>> Here is my code. >>> >>> package com.hello.world; >>> >>> import org.antlr.runtime.ANTLRStringStream; >>> import org.antlr.runtime.CommonTokenStream; >>> import org.apache.cassandra.cql3.CqlLexer; >>> import org.apache.cassandra.cql3.CqlParser; >>> import org.apache.cassandra.cql3.statements.CreateTableStatement; >>> import org.apache.cassandra.cql3.statements.ParsedStatement; >>> >>> public class Test { >>> >>> public static void main(String[] args) throws Exception { >>> String stmt = "create table if not exists test_keyspace.my_table >>> (field1 text, field2 int, field3 set, field4 map<ascii, text>, >>> primary key (field1) );"; >>> ANTLRStringStream st
Re: How to Parse raw CQL text?
I just did some trial and error. Looks like this would work public class Test { public static void main(String[] args) throws Exception { String stmt = "create table if not exists test_keyspace.my_table (field1 text, field2 int, field3 set, field4 map<ascii, text>, primary key (field1) );"; ANTLRStringStream stringStream = new ANTLRStringStream(stmt); CqlLexer cqlLexer = new CqlLexer(stringStream); CommonTokenStream token = new CommonTokenStream(cqlLexer); CqlParser parser = new CqlParser(token); ParsedStatement query = parser.cqlStatement(); if (query.getClass().getDeclaringClass() == CreateTableStatement.class) { CreateTableStatement.RawStatement cts = (CreateTableStatement.RawStatement) query; CFMetaData .compile(stmt, cts.keyspace()) .getColumnMetadata() .values() .stream() .forEach(cd -> System.out.println(cd)); } } } On Mon, Feb 5, 2018 at 2:13 PM, Kant Kodali <k...@peernova.com> wrote: > Hi Anant, > > I just have CQL create table statement as a string I want to extract all > the parts like, tableName, KeySpaceName, regular Columns, partitionKey, > ClusteringKey, Clustering Order and so on. Thats really it! > > Thanks! > > On Mon, Feb 5, 2018 at 1:50 PM, Rahul Singh <rahul.xavier.si...@gmail.com> > wrote: > >> I think I understand what you are trying to do … but what is your goal? >> What do you mean “use it for different” queries… Maybe you want to do an >> event and have an event processor? Seems like you are trying to basically >> by pass that pattern and parse a query and split it into several actions? >> >> Did you look into this unit test folder? >> >> https://github.com/apache/cassandra/blob/trunk/test/unit/ >> org/apache/cassandra/cql3/CQLTester.java >> >> -- >> Rahul Singh >> rahul.si...@anant.us >> >> Anant Corporation >> >> On Feb 5, 2018, 4:06 PM -0500, Kant Kodali <k...@peernova.com>, wrote: >> >> Hi All, >> >> I have a need where I get a raw CQL create table statement as a String >> and I need to parse the keyspace, tablename, columns and so on..so I can >> use it for various queries and send it to C*. I used the example below >> from this link <https://github.com/tacoo/cassandra-antlr-sample>. I get >> the following error. And I thought maybe someone in this mailing list will >> be more familiar with internals. >> >> Exception in thread "main" >> org.apache.cassandra.exceptions.ConfigurationException: >> Keyspace test_keyspace doesn't exist >> at org.apache.cassandra.cql3.statements.CreateTableStatement$Ra >> wStatement.prepare(CreateTableStatement.java:200) >> at com.hello.world.Test.main(Test.java:23) >> >> >> Here is my code. >> >> package com.hello.world; >> >> import org.antlr.runtime.ANTLRStringStream; >> import org.antlr.runtime.CommonTokenStream; >> import org.apache.cassandra.cql3.CqlLexer; >> import org.apache.cassandra.cql3.CqlParser; >> import org.apache.cassandra.cql3.statements.CreateTableStatement; >> import org.apache.cassandra.cql3.statements.ParsedStatement; >> >> public class Test { >> >> public static void main(String[] args) throws Exception { >> String stmt = "create table if not exists test_keyspace.my_table >> (field1 text, field2 int, field3 set, field4 map<ascii, text>, >> primary key (field1) );"; >> ANTLRStringStream stringStream = new ANTLRStringStream(stmt); >> CqlLexer cqlLexer = new CqlLexer(stringStream); >> CommonTokenStream token = new CommonTokenStream(cqlLexer); >> CqlParser parser = new CqlParser(token); >> ParsedStatement query = parser.query(); >> if (query.getClass().getDeclaringClass() == >> CreateTableStatement.class) { >> CreateTableStatement.RawStatement cts = >> (CreateTableStatement.RawStatement) query; >> System.out.println(cts.keyspace()); >> System.out.println(cts.columnFamily()); >> ParsedStatement.Prepared prepared = cts.prepare(); >> CreateTableStatement cts2 = (CreateTableStatement) >> prepared.statement; >> cts2.getCFMetaData() >> .getColumnMetadata() >> .values() >> .stream() >> .forEach(cd -> System.out.println(cd)); >> } >> } >> } >> >> Thanks! >> >> >
Re: How to Parse raw CQL text?
Hi Anant, I just have CQL create table statement as a string I want to extract all the parts like, tableName, KeySpaceName, regular Columns, partitionKey, ClusteringKey, Clustering Order and so on. Thats really it! Thanks! On Mon, Feb 5, 2018 at 1:50 PM, Rahul Singh <rahul.xavier.si...@gmail.com> wrote: > I think I understand what you are trying to do … but what is your goal? > What do you mean “use it for different” queries… Maybe you want to do an > event and have an event processor? Seems like you are trying to basically > by pass that pattern and parse a query and split it into several actions? > > Did you look into this unit test folder? > > https://github.com/apache/cassandra/blob/trunk/test/ > unit/org/apache/cassandra/cql3/CQLTester.java > > -- > Rahul Singh > rahul.si...@anant.us > > Anant Corporation > > On Feb 5, 2018, 4:06 PM -0500, Kant Kodali <k...@peernova.com>, wrote: > > Hi All, > > I have a need where I get a raw CQL create table statement as a String and > I need to parse the keyspace, tablename, columns and so on..so I can use it > for various queries and send it to C*. I used the example below from this > link <https://github.com/tacoo/cassandra-antlr-sample>. I get the > following error. And I thought maybe someone in this mailing list will be > more familiar with internals. > > Exception in thread "main" > org.apache.cassandra.exceptions.ConfigurationException: > Keyspace test_keyspace doesn't exist > at org.apache.cassandra.cql3.statements.CreateTableStatement$Ra > wStatement.prepare(CreateTableStatement.java:200) > at com.hello.world.Test.main(Test.java:23) > > > Here is my code. > > package com.hello.world; > > import org.antlr.runtime.ANTLRStringStream; > import org.antlr.runtime.CommonTokenStream; > import org.apache.cassandra.cql3.CqlLexer; > import org.apache.cassandra.cql3.CqlParser; > import org.apache.cassandra.cql3.statements.CreateTableStatement; > import org.apache.cassandra.cql3.statements.ParsedStatement; > > public class Test { > > public static void main(String[] args) throws Exception { > String stmt = "create table if not exists test_keyspace.my_table > (field1 text, field2 int, field3 set, field4 map<ascii, text>, primary > key (field1) );"; > ANTLRStringStream stringStream = new ANTLRStringStream(stmt); > CqlLexer cqlLexer = new CqlLexer(stringStream); > CommonTokenStream token = new CommonTokenStream(cqlLexer); > CqlParser parser = new CqlParser(token); > ParsedStatement query = parser.query(); > if (query.getClass().getDeclaringClass() == > CreateTableStatement.class) { > CreateTableStatement.RawStatement cts = > (CreateTableStatement.RawStatement) query; > System.out.println(cts.keyspace()); > System.out.println(cts.columnFamily()); > ParsedStatement.Prepared prepared = cts.prepare(); > CreateTableStatement cts2 = (CreateTableStatement) > prepared.statement; > cts2.getCFMetaData() > .getColumnMetadata() > .values() > .stream() > .forEach(cd -> System.out.println(cd)); > } > } > } > > Thanks! > >
Re: How to Parse raw CQL text?
I think I understand what you are trying to do … but what is your goal? What do you mean “use it for different” queries… Maybe you want to do an event and have an event processor? Seems like you are trying to basically by pass that pattern and parse a query and split it into several actions? Did you look into this unit test folder? https://github.com/apache/cassandra/blob/trunk/test/unit/org/apache/cassandra/cql3/CQLTester.java -- Rahul Singh rahul.si...@anant.us Anant Corporation On Feb 5, 2018, 4:06 PM -0500, Kant Kodali <k...@peernova.com>, wrote: > Hi All, > > I have a need where I get a raw CQL create table statement as a String and I > need to parse the keyspace, tablename, columns and so on..so I can use it for > various queries and send it to C*. I used the example below from this link. I > get the following error. And I thought maybe someone in this mailing list > will be more familiar with internals. > > Exception in thread "main" > org.apache.cassandra.exceptions.ConfigurationException: Keyspace > test_keyspace doesn't exist > at > org.apache.cassandra.cql3.statements.CreateTableStatement$RawStatement.prepare(CreateTableStatement.java:200) > at com.hello.world.Test.main(Test.java:23) > > > Here is my code. > > package com.hello.world; > > import org.antlr.runtime.ANTLRStringStream; > import org.antlr.runtime.CommonTokenStream; > import org.apache.cassandra.cql3.CqlLexer; > import org.apache.cassandra.cql3.CqlParser; > import org.apache.cassandra.cql3.statements.CreateTableStatement; > import org.apache.cassandra.cql3.statements.ParsedStatement; > > public class Test { > >public static void main(String[] args) throws Exception { >String stmt = "create table if not exists test_keyspace.my_table > (field1 text, field2 int, field3 set, field4 map<ascii, text>, primary > key (field1) );"; >ANTLRStringStream stringStream = new ANTLRStringStream(stmt); >CqlLexer cqlLexer = new CqlLexer(stringStream); >CommonTokenStream token = new CommonTokenStream(cqlLexer); >CqlParser parser = new CqlParser(token); >ParsedStatement query = parser.query(); >if (query.getClass().getDeclaringClass() == > CreateTableStatement.class) { >CreateTableStatement.RawStatement cts = > (CreateTableStatement.RawStatement) query; >System.out.println(cts.keyspace()); >System.out.println(cts.columnFamily()); >ParsedStatement.Prepared prepared = cts.prepare(); >CreateTableStatement cts2 = (CreateTableStatement) > prepared.statement; >cts2.getCFMetaData() >.getColumnMetadata() >.values() >.stream() >.forEach(cd -> System.out.println(cd)); >} >} > } > Thanks!
How to Parse raw CQL text?
Hi All, I have a need where I get a raw CQL create table statement as a String and I need to parse the keyspace, tablename, columns and so on..so I can use it for various queries and send it to C*. I used the example below from this link <https://github.com/tacoo/cassandra-antlr-sample>. I get the following error. And I thought maybe someone in this mailing list will be more familiar with internals. Exception in thread "main" org.apache.cassandra.exceptions.ConfigurationException: Keyspace test_keyspace doesn't exist at org.apache.cassandra.cql3.statements.CreateTableStatement$ RawStatement.prepare(CreateTableStatement.java:200) at com.hello.world.Test.main(Test.java:23) Here is my code. package com.hello.world; import org.antlr.runtime.ANTLRStringStream; import org.antlr.runtime.CommonTokenStream; import org.apache.cassandra.cql3.CqlLexer; import org.apache.cassandra.cql3.CqlParser; import org.apache.cassandra.cql3.statements.CreateTableStatement; import org.apache.cassandra.cql3.statements.ParsedStatement; public class Test { public static void main(String[] args) throws Exception { String stmt = "create table if not exists test_keyspace.my_table (field1 text, field2 int, field3 set, field4 map<ascii, text>, primary key (field1) );"; ANTLRStringStream stringStream = new ANTLRStringStream(stmt); CqlLexer cqlLexer = new CqlLexer(stringStream); CommonTokenStream token = new CommonTokenStream(cqlLexer); CqlParser parser = new CqlParser(token); ParsedStatement query = parser.query(); if (query.getClass().getDeclaringClass() == CreateTableStatement.class) { CreateTableStatement.RawStatement cts = (CreateTableStatement.RawStatement) query; System.out.println(cts.keyspace()); System.out.println(cts.columnFamily()); ParsedStatement.Prepared prepared = cts.prepare(); CreateTableStatement cts2 = (CreateTableStatement) prepared.statement; cts2.getCFMetaData() .getColumnMetadata() .values() .stream() .forEach(cd -> System.out.println(cd)); } } } Thanks!
Re: Golang + Cassandra + Text Search
https://github.com/Stratio/cassandra-lucene-index is another option - it plugs a full Lucene engine into Cassandra's custom secondary index interface. If you only need text prefix/postfix/substring matching or basic tokenization there is SASI. On Wed, 25 Oct 2017 at 03:50 Who Dadddy <qwerty15...@gmail.com> wrote: > Ridley - have a look at Elassandra > https://github.com/strapdata/elassandra > > > On 24 Oct 2017, at 06:50, Ridley Submission < > ridley.submission2...@gmail.com> wrote: > > Hi, > > Quick question, I am wondering if anyone here who works with Go has > specific recommendations for as simple framework to add text search on top > of cassandra? > > (Apologies if this is off topic—I am not quite sure what forum in the > cassandra community would be best for this type of question) > > Thanks, > Riley > > > -- *Justin Cameron*Senior Software Engineer <https://www.instaclustr.com/> This email has been sent on behalf of Instaclustr Pty. Limited (Australia) and Instaclustr Inc (USA). This email and any attachments may contain confidential and legally privileged information. If you are not the intended recipient, do not copy or disclose its content, but please reply to this email immediately and highlight the error to the sender and then immediately delete the message.
Re: Golang + Cassandra + Text Search
Ridley - have a look at Elassandra https://github.com/strapdata/elassandra <https://github.com/strapdata/elassandra> > On 24 Oct 2017, at 06:50, Ridley Submission <ridley.submission2...@gmail.com> > wrote: > > Hi, > > Quick question, I am wondering if anyone here who works with Go has specific > recommendations for as simple framework to add text search on top of > cassandra? > > (Apologies if this is off topic—I am not quite sure what forum in the > cassandra community would be best for this type of question) > > Thanks, > Riley
Re: Golang + Cassandra + Text Search
When someone talks about full text search, I usually assume there’s more required than keyword search, ie simple tokenization and a little stemming. * Term Vectors, common used for a “more like this feature” * Ranking of search results * Facets * More complex tokenization like trigrams So anyway, I don’t know if the OP had those requirements, but it’s important to keep in mind. > On Oct 24, 2017, at 1:33 AM, DuyHai Doan <doanduy...@gmail.com> wrote: > > There is already a full text search index in Cassandra called SASI > > On Tue, Oct 24, 2017 at 6:50 AM, Ridley Submission > <ridley.submission2...@gmail.com <mailto:ridley.submission2...@gmail.com>> > wrote: > Hi, > > Quick question, I am wondering if anyone here who works with Go has specific > recommendations for as simple framework to add text search on top of > cassandra? > > (Apologies if this is off topic—I am not quite sure what forum in the > cassandra community would be best for this type of question) > > Thanks, > Riley >
Re: Golang + Cassandra + Text Search
There is already a full text search index in Cassandra called SASI On Tue, Oct 24, 2017 at 6:50 AM, Ridley Submission < ridley.submission2...@gmail.com> wrote: > Hi, > > Quick question, I am wondering if anyone here who works with Go has > specific recommendations for as simple framework to add text search on top > of cassandra? > > (Apologies if this is off topic—I am not quite sure what forum in the > cassandra community would be best for this type of question) > > Thanks, > Riley >
Golang + Cassandra + Text Search
Hi, Quick question, I am wondering if anyone here who works with Go has specific recommendations for as simple framework to add text search on top of cassandra? (Apologies if this is off topic—I am not quite sure what forum in the cassandra community would be best for this type of question) Thanks, Riley
Re: Cassandra blob vs base64 text
You could save space when storing your data (base64-)decoded as blobs. 2017-02-20 13:38 GMT+01:00 Oskar Kjellin <oskar.kjel...@gmail.com>: > We currently have some cases where we store base64 as a text field instead > of a blob (running version 2.0.17). > I would like to move these to blob but wondering what benefits and > optimizations there are? The possible ones I can think of is (but there's > probably more): > > * blob is stored as off heap ByteBuffers? > * blob won't be decompressed server side? > > Are there any other reasons to switch to blobs? Or are we not going to see > any difference? > > Thanks! >
Cassandra blob vs base64 text
We currently have some cases where we store base64 as a text field instead of a blob (running version 2.0.17). I would like to move these to blob but wondering what benefits and optimizations there are? The possible ones I can think of is (but there's probably more): * blob is stored as off heap ByteBuffers? * blob won't be decompressed server side? Are there any other reasons to switch to blobs? Or are we not going to see any difference? Thanks!
回复:UDA can't use int or text as state_type
problem solved !!! INITCOND {} should be INITCOND 0 原始邮件 发件人:lowpinglowp...@163.com 收件人:useru...@cassandra.apache.org 发送时间:2016年6月27日(周一) 16:03 主题:UDA can't use int or text as state_type Hi, all I got a problem today when I create a UDA like this. hope you guys help me solve this CREATE OR REPLACE FUNCTION sum_fun(state int, type text) // if state type is SET or MAP , this is work CALLED ON NULL INPUT RETURNS int LANGUAGE java AS 'return Integer.parseInt(type)+state;' ; CREATE OR REPLACE AGGREGATE aggr_sum(text) SFUNC sum_fun STYPE int INITCOND {}; error message: InvalidRequest: code=2200 [Invalid query] message="Invalid set literal for (aggregate_initcond) of type int" cassandra version: 2.2
UDA can't use int or text as state_type
Hi, all I got a problem today when I create a UDA like this. hope you guys help me solve this CREATE OR REPLACE FUNCTION sum_fun(state int, type text) // if state type is SET or MAP , this is work CALLED ON NULL INPUT RETURNS int LANGUAGE java AS 'return Integer.parseInt(type)+state;' ; CREATE OR REPLACE AGGREGATE aggr_sum(text) SFUNC sum_fun STYPE int INITCOND {}; error message: InvalidRequest: code=2200 [Invalid query] message="Invalid set literal for (aggregate_initcond) of type int" cassandra version: 2.2
Store JSON as text or UTF-8 encoded blobs?
Hey. I’m considering migrating my DB from using multiple columns to just 2 columns, with the second one being a JSON object. Is there going to be any real difference between TEXT or UTF-8 encoded BLOB? I guess it would probably be easier to get tools like spark to parse the object as JSON if it’s represented as a BLOB. -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts
text partition key Bloom filters fp is 1 always, why?
Hello, I have a text partition key for one of the CF. The cfstats on that table seems to show that the bloom filter false positive ratio is always 1. Also the bloom filter is using very less space. Do bloom filters not work well with text partition keys ? I can assume this as it can no way detect the length of the text and hence would have a very high false positive. The text partition key is combined using a long + _ + epoch_time_in_hours, would it be better if we have a composite partition key of the (long, epoch_time_in_hours) rather than combining it as a text key ? Thanks anishek
efficiently generate complete database dump in text format
Hi, We have a Cassandra database column family containing 320 millions rows and each row contains about 15 columns. We want to take monthly dump of this single column family contained in this database in text format. We are planning to take following approach to implement this functionality 1. Take a snapshot of Cassandra database using nodetool utility. We specify -cf flag to specify column family name so that snapshot contains data corresponding to a single column family. 2. We take backup of this snapshot and move this backup to a separate physical machine. 3. We using SStable to json conversion utility to json convert all the data files into json format. We have following questions/doubts regarding the above approach a) Generated json records contains d (IS_MARKED_FOR_DELETE) flag in json record and can I safely ignore all such json records? b) If I ignore all records marked by d flag, than can generated json files in step 3, contain duplicate records? I mean do multiple entries for same key. Do there can be any other better approach to generate data dumps in text format. Regards, Gaurav
Re: efficiently generate complete database dump in text format
The best way to generate dumps from Cassandra is via Hadoop integration (or spark). You can find more info here: http://www.datastax.com/documentation/cassandra/2.1/cassandra/configuration/configHadoop.html http://wiki.apache.org/cassandra/HadoopSupport On Thu, Oct 9, 2014 at 4:19 AM, Gaurav Bhatnagar gbhatna...@gmail.com wrote: Hi, We have a Cassandra database column family containing 320 millions rows and each row contains about 15 columns. We want to take monthly dump of this single column family contained in this database in text format. We are planning to take following approach to implement this functionality 1. Take a snapshot of Cassandra database using nodetool utility. We specify -cf flag to specify column family name so that snapshot contains data corresponding to a single column family. 2. We take backup of this snapshot and move this backup to a separate physical machine. 3. We using SStable to json conversion utility to json convert all the data files into json format. We have following questions/doubts regarding the above approach a) Generated json records contains d (IS_MARKED_FOR_DELETE) flag in json record and can I safely ignore all such json records? b) If I ignore all records marked by d flag, than can generated json files in step 3, contain duplicate records? I mean do multiple entries for same key. Do there can be any other better approach to generate data dumps in text format. Regards, Gaurav -- *Paulo Motta* Chaordic | *Platform* *www.chaordic.com.br http://www.chaordic.com.br/* +55 48 3232.3200
Re: efficiently generate complete database dump in text format
You might also want to consider tools like https://github.com/Netflix/aegisthus for the last step, which can help you deal with tombstones and de-duplicate data. Thanks, Daniel On Thu, Oct 9, 2014 at 12:19 AM, Gaurav Bhatnagar gbhatna...@gmail.com wrote: Hi, We have a Cassandra database column family containing 320 millions rows and each row contains about 15 columns. We want to take monthly dump of this single column family contained in this database in text format. We are planning to take following approach to implement this functionality 1. Take a snapshot of Cassandra database using nodetool utility. We specify -cf flag to specify column family name so that snapshot contains data corresponding to a single column family. 2. We take backup of this snapshot and move this backup to a separate physical machine. 3. We using SStable to json conversion utility to json convert all the data files into json format. We have following questions/doubts regarding the above approach a) Generated json records contains d (IS_MARKED_FOR_DELETE) flag in json record and can I safely ignore all such json records? b) If I ignore all records marked by d flag, than can generated json files in step 3, contain duplicate records? I mean do multiple entries for same key. Do there can be any other better approach to generate data dumps in text format. Regards, Gaurav
Re: is lack of full text search hurting cassandra and datastax?
There are some options around for full text search integration with C*. Google for Stratio deep and Stargate. Both are open source Le 3 oct. 2014 06:31, Kevin Burton bur...@spinn3r.com a écrit : So right now I have plenty of quality and robust full text search systems I can use. Solr cloud, elastic search. They all also have very robust UIs on top of them… kibana, banana, etc. and my alternative for cassandra is… paying for a proprietary database. Which might be fine for some parties… but I want something that is documented and supported by the community and all the advantages of open source. So is DSE really giving Datastax that much of a win? I’m sure they are making money of it… and I hope they’re successful of course. But I can’t help but feeling that cassandra as an open source project is being hindered by lack of a full text option. Additionally, some people can get away with storing the content directly in a full text system and skipping the cassandra route altogether. Seems like a situation without many winners… -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com
Re: is lack of full text search hurting cassandra and datastax?
You can use also Stratio Cassandra https://github.com/Stratio/stratio-cassandra, which is an open source fork of Cassandra with Lucene based full text search capabilities. -- Andrés de la Peña http://www.stratio.com/ Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón, Madrid Tel: +34 91 352 59 42 // *@stratiobd https://twitter.com/StratioBD*
Re: is lack of full text search hurting cassandra and datastax?
And meanwhile, DataStax will continue to invest in and promote and support full text search of your Cassandra data with our tight integration of Solr in DataStax Enterprise. BTW, there is in fact very strong interest in DataStax Enterprise, and not just as “support” for raw Cassandra, so I’m not so sure that I’d worry too much that DataStax is “hurting” in any way! And to be clear, even with DataStax enterprise, your data is still stored in the same sstables that you’ve come to love in Cassandra, and with the same CQL API as well, so your data is in no way trapped in... a “proprietary database.” Our Solr indexing is in addition to the storage of your data in Cassandra. Sure, there will be a few losers in any significantly large and complex activity, but rest assured that the vast majority will be winners here. And as DuyHai notes, there are indeed open source options available as well. -- Jack Krupansky From: DuyHai Doan Sent: Friday, October 3, 2014 3:54 AM To: user@cassandra.apache.org Subject: Re: is lack of full text search hurting cassandra and datastax? There are some options around for full text search integration with C*. Google for Stratio deep and Stargate. Both are open source Le 3 oct. 2014 06:31, Kevin Burton bur...@spinn3r.com a écrit : So right now I have plenty of quality and robust full text search systems I can use. Solr cloud, elastic search. They all also have very robust UIs on top of them… kibana, banana, etc. and my alternative for cassandra is… paying for a proprietary database. Which might be fine for some parties… but I want something that is documented and supported by the community and all the advantages of open source. So is DSE really giving Datastax that much of a win? I’m sure they are making money of it… and I hope they’re successful of course. But I can’t help but feeling that cassandra as an open source project is being hindered by lack of a full text option. Additionally, some people can get away with storing the content directly in a full text system and skipping the cassandra route altogether. Seems like a situation without many winners… -- Founder/CEO Spinn3r.com Location: San Francisco, CA blog: http://burtonator.wordpress.com … or check out my Google+ profile
is lack of full text search hurting cassandra and datastax?
So right now I have plenty of quality and robust full text search systems I can use. Solr cloud, elastic search. They all also have very robust UIs on top of them… kibana, banana, etc. and my alternative for cassandra is… paying for a proprietary database. Which might be fine for some parties… but I want something that is documented and supported by the community and all the advantages of open source. So is DSE really giving Datastax that much of a win? I’m sure they are making money of it… and I hope they’re successful of course. But I can’t help but feeling that cassandra as an open source project is being hindered by lack of a full text option. Additionally, some people can get away with storing the content directly in a full text system and skipping the cassandra route altogether. Seems like a situation without many winners… -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com
Re: Adding large text blob causes read timeout...
oh.. the difference between the the ONE field and the remaining 29 is massive. It's like 200ms for just the 29 columns.. adding the extra one cause it to timeout .. 5000ms... On Mon, Jun 23, 2014 at 10:30 PM, DuyHai Doan doanduy...@gmail.com wrote: Don't forget that when you do the Select with limit set to 1000, Cassandra is actually fetching 1000 * 29 physical columns (29 fields per logical row). Adding one extra big html column may be too much and cause timeout. Try to: 1. Select only the big html only 2. Or reduce the limit incrementally until no timeout Le 24 juin 2014 06:22, Kevin Burton bur...@spinn3r.com a écrit : I have a table with a schema mostly of small fields. About 30 of them. The primary key is: primary key( bucket, sequence ) … I have 100 buckets and the idea is that sequence is ever increasing. This way I can read from bucket zero, and everything after sequence N and get all the writes ordered by time. I'm running SELECT ... FROM content WHERE bucket=0 AND sequence0 ORDER BY sequence ASC LIMIT 1000; … using the have driver. If I add ALL the fields, except one, so 29 fields, the query is fast. Only 129ms…. However, if I add the 'html' field, which is snapshot of HTML obvious, the query times out… I'm going to add tracing and try to track it down further, but I suspect I'm doing something stupid. Is it going to burn me that the data is UTF8 encoded? I can't image decoding UTF8 is going to be THAT slow but perhaps cassandra is doing something silly under the covers? cqlsh doesn't time out … it actually works fine but it uses 100% CPU while writing out the data so it's not a good comparison unfortunately ception in thread main com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: ...:9042 (com.datastax.driver.core.exceptions.DriverException: Timeout during read)) at com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:65) at com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:256) at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:172) at com.datastax.driver.core.SessionManager.execute(SessionManager.java:92) at com.spinn3r.artemis.robot.console.BenchmarkContentStream.main(BenchmarkContentStream.java:100) Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: dev4.wdc.sl.spinn3r.com/10.24.23.94:9042 (com.datastax.driver.core.exceptions.DriverException: Timeout during read)) at com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:103) at com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:175) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* Skype: *burtonator* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com War is peace. Freedom is slavery. Ignorance is strength. Corporations are people. -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* Skype: *burtonator* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com War is peace. Freedom is slavery. Ignorance is strength. Corporations are people.
Re: Adding large text blob causes read timeout...
Yes but adding the extra one ends up by * 1000. The limit in CQL3 specifies the number of logical rows, not the number of physical columns in the storage engine Le 24 juin 2014 08:30, Kevin Burton bur...@spinn3r.com a écrit : oh.. the difference between the the ONE field and the remaining 29 is massive. It's like 200ms for just the 29 columns.. adding the extra one cause it to timeout .. 5000ms... On Mon, Jun 23, 2014 at 10:30 PM, DuyHai Doan doanduy...@gmail.com wrote: Don't forget that when you do the Select with limit set to 1000, Cassandra is actually fetching 1000 * 29 physical columns (29 fields per logical row). Adding one extra big html column may be too much and cause timeout. Try to: 1. Select only the big html only 2. Or reduce the limit incrementally until no timeout Le 24 juin 2014 06:22, Kevin Burton bur...@spinn3r.com a écrit : I have a table with a schema mostly of small fields. About 30 of them. The primary key is: primary key( bucket, sequence ) … I have 100 buckets and the idea is that sequence is ever increasing. This way I can read from bucket zero, and everything after sequence N and get all the writes ordered by time. I'm running SELECT ... FROM content WHERE bucket=0 AND sequence0 ORDER BY sequence ASC LIMIT 1000; … using the have driver. If I add ALL the fields, except one, so 29 fields, the query is fast. Only 129ms…. However, if I add the 'html' field, which is snapshot of HTML obvious, the query times out… I'm going to add tracing and try to track it down further, but I suspect I'm doing something stupid. Is it going to burn me that the data is UTF8 encoded? I can't image decoding UTF8 is going to be THAT slow but perhaps cassandra is doing something silly under the covers? cqlsh doesn't time out … it actually works fine but it uses 100% CPU while writing out the data so it's not a good comparison unfortunately ception in thread main com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: ...:9042 (com.datastax.driver.core.exceptions.DriverException: Timeout during read)) at com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:65) at com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:256) at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:172) at com.datastax.driver.core.SessionManager.execute(SessionManager.java:92) at com.spinn3r.artemis.robot.console.BenchmarkContentStream.main(BenchmarkContentStream.java:100) Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: dev4.wdc.sl.spinn3r.com/10.24.23.94:9042 (com.datastax.driver.core.exceptions.DriverException: Timeout during read)) at com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:103) at com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:175) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* Skype: *burtonator* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com War is peace. Freedom is slavery. Ignorance is strength. Corporations are people. -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* Skype: *burtonator* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com War is peace. Freedom is slavery. Ignorance is strength. Corporations are people.
Re: Can I call getBytes on a text column to get the raw (already encoded UTF8)
Assuming we're talking about the DataStax Java driver: getBytes will throw an exception, because it validates that the column is of type BLOB. But you can use getBytesUnsafe: ByteBuffer b = row.getBytesUnsafe(aTextColumn); // if you want to check it: Charset.forName(UTF-8).decode(b); Regarding whether this will continue working in the future: from the driver's perspective, the fact that the native protocol uses UTF-8 is an implementation detail, but I doubt this will change any time soon. On Tue, Jun 24, 2014 at 7:23 AM, DuyHai Doan doanduy...@gmail.com wrote: Good idea, bytes are merely processed by the server so you're saving a lot of Cpu. AFAIK getBytes should work fine. Le 24 juin 2014 05:50, Kevin Burton bur...@spinn3r.com a écrit : I'm building a webservice whereby I read the data from cassandra, then write it over the wire. It's going to push LOTS of content, and encoding/decoding performance has really bitten us in the future. So I try to avoid transparent encoding/decoding if I can avoid it. So right now, I have a huge blob of text that's a 'text' column. Logically it *should* be text, because that's what it is... Can I just keep it as text so our normal tools work on it, but get it as raw UTF8 if I call getBytes? This way I can call getBytes and then send it right over the wire as pre-encoded UTF8 data. ... and of course the question is whether it will continue working in the future :-P I'll write a test of it of course but I wanted to see what you guys thought of this idea. -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* Skype: *burtonator* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com War is peace. Freedom is slavery. Ignorance is strength. Corporations are people.
Re: Can I call getBytes on a text column to get the raw (already encoded UTF8)
You can use getBytesUnsafe on the UTF8 column -- Sent from my iPhone Am 24.06.2014 um 09:13 schrieb Olivier Michallat olivier.michal...@datastax.com: Assuming we're talking about the DataStax Java driver: getBytes will throw an exception, because it validates that the column is of type BLOB. But you can use getBytesUnsafe: ByteBuffer b = row.getBytesUnsafe(aTextColumn); // if you want to check it: Charset.forName(UTF-8).decode(b); Regarding whether this will continue working in the future: from the driver's perspective, the fact that the native protocol uses UTF-8 is an implementation detail, but I doubt this will change any time soon. On Tue, Jun 24, 2014 at 7:23 AM, DuyHai Doan doanduy...@gmail.com wrote: Good idea, bytes are merely processed by the server so you're saving a lot of Cpu. AFAIK getBytes should work fine. Le 24 juin 2014 05:50, Kevin Burton bur...@spinn3r.com a écrit : I'm building a webservice whereby I read the data from cassandra, then write it over the wire. It's going to push LOTS of content, and encoding/decoding performance has really bitten us in the future. So I try to avoid transparent encoding/decoding if I can avoid it. So right now, I have a huge blob of text that's a 'text' column. Logically it *should* be text, because that's what it is... Can I just keep it as text so our normal tools work on it, but get it as raw UTF8 if I call getBytes? This way I can call getBytes and then send it right over the wire as pre-encoded UTF8 data. ... and of course the question is whether it will continue working in the future :-P I'll write a test of it of course but I wanted to see what you guys thought of this idea. -- Founder/CEO Spinn3r.com Location: San Francisco, CA Skype: burtonator blog: http://burtonator.wordpress.com … or check out my Google+ profile War is peace. Freedom is slavery. Ignorance is strength. Corporations are people.
Re: Adding large text blob causes read timeout...
Can you do you query in the cli after setting tracing on? On Mon, Jun 23, 2014 at 11:32 PM, DuyHai Doan doanduy...@gmail.com wrote: Yes but adding the extra one ends up by * 1000. The limit in CQL3 specifies the number of logical rows, not the number of physical columns in the storage engine Le 24 juin 2014 08:30, Kevin Burton bur...@spinn3r.com a écrit : oh.. the difference between the the ONE field and the remaining 29 is massive. It's like 200ms for just the 29 columns.. adding the extra one cause it to timeout .. 5000ms... On Mon, Jun 23, 2014 at 10:30 PM, DuyHai Doan doanduy...@gmail.com wrote: Don't forget that when you do the Select with limit set to 1000, Cassandra is actually fetching 1000 * 29 physical columns (29 fields per logical row). Adding one extra big html column may be too much and cause timeout. Try to: 1. Select only the big html only 2. Or reduce the limit incrementally until no timeout Le 24 juin 2014 06:22, Kevin Burton bur...@spinn3r.com a écrit : I have a table with a schema mostly of small fields. About 30 of them. The primary key is: primary key( bucket, sequence ) … I have 100 buckets and the idea is that sequence is ever increasing. This way I can read from bucket zero, and everything after sequence N and get all the writes ordered by time. I'm running SELECT ... FROM content WHERE bucket=0 AND sequence0 ORDER BY sequence ASC LIMIT 1000; … using the have driver. If I add ALL the fields, except one, so 29 fields, the query is fast. Only 129ms…. However, if I add the 'html' field, which is snapshot of HTML obvious, the query times out… I'm going to add tracing and try to track it down further, but I suspect I'm doing something stupid. Is it going to burn me that the data is UTF8 encoded? I can't image decoding UTF8 is going to be THAT slow but perhaps cassandra is doing something silly under the covers? cqlsh doesn't time out … it actually works fine but it uses 100% CPU while writing out the data so it's not a good comparison unfortunately ception in thread main com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: ...:9042 (com.datastax.driver.core.exceptions.DriverException: Timeout during read)) at com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:65) at com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:256) at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:172) at com.datastax.driver.core.SessionManager.execute(SessionManager.java:92) at com.spinn3r.artemis.robot.console.BenchmarkContentStream.main(BenchmarkContentStream.java:100) Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: dev4.wdc.sl.spinn3r.com/10.24.23.94:9042 (com.datastax.driver.core.exceptions.DriverException: Timeout during read)) at com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:103) at com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:175) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* Skype: *burtonator* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com War is peace. Freedom is slavery. Ignorance is strength. Corporations are people. -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* Skype: *burtonator* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com War is peace. Freedom is slavery. Ignorance is strength. Corporations are people. -- Jon Haddad http://www.rustyrazorblade.com skype: rustyrazorblade
Re: Can I call getBytes on a text column to get the raw (already encoded UTF8)
Yes… I confirmed that getBytesUnsafe works… I also have a unit test for it so if cassandra ever changes anything we'll pick it up. One point in your above code. I still think charsets are behind a synchronized code block. So your above code wouldn't be super fast on multi-core machines. I usually use guava's Charsets class since they have static references to all of them. … just wanted to point that out since it could bite someone :-P … On Tue, Jun 24, 2014 at 12:13 AM, Olivier Michallat olivier.michal...@datastax.com wrote: Assuming we're talking about the DataStax Java driver: getBytes will throw an exception, because it validates that the column is of type BLOB. But you can use getBytesUnsafe: ByteBuffer b = row.getBytesUnsafe(aTextColumn); // if you want to check it: Charset.forName(UTF-8).decode(b); Regarding whether this will continue working in the future: from the driver's perspective, the fact that the native protocol uses UTF-8 is an implementation detail, but I doubt this will change any time soon. On Tue, Jun 24, 2014 at 7:23 AM, DuyHai Doan doanduy...@gmail.com wrote: Good idea, bytes are merely processed by the server so you're saving a lot of Cpu. AFAIK getBytes should work fine. Le 24 juin 2014 05:50, Kevin Burton bur...@spinn3r.com a écrit : I'm building a webservice whereby I read the data from cassandra, then write it over the wire. It's going to push LOTS of content, and encoding/decoding performance has really bitten us in the future. So I try to avoid transparent encoding/decoding if I can avoid it. So right now, I have a huge blob of text that's a 'text' column. Logically it *should* be text, because that's what it is... Can I just keep it as text so our normal tools work on it, but get it as raw UTF8 if I call getBytes? This way I can call getBytes and then send it right over the wire as pre-encoded UTF8 data. ... and of course the question is whether it will continue working in the future :-P I'll write a test of it of course but I wanted to see what you guys thought of this idea. -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* Skype: *burtonator* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com War is peace. Freedom is slavery. Ignorance is strength. Corporations are people. -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* Skype: *burtonator* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com War is peace. Freedom is slavery. Ignorance is strength. Corporations are people.
Can I call getBytes on a text column to get the raw (already encoded UTF8)
I'm building a webservice whereby I read the data from cassandra, then write it over the wire. It's going to push LOTS of content, and encoding/decoding performance has really bitten us in the future. So I try to avoid transparent encoding/decoding if I can avoid it. So right now, I have a huge blob of text that's a 'text' column. Logically it *should* be text, because that's what it is... Can I just keep it as text so our normal tools work on it, but get it as raw UTF8 if I call getBytes? This way I can call getBytes and then send it right over the wire as pre-encoded UTF8 data. ... and of course the question is whether it will continue working in the future :-P I'll write a test of it of course but I wanted to see what you guys thought of this idea. -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* Skype: *burtonator* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com War is peace. Freedom is slavery. Ignorance is strength. Corporations are people.
Adding large text blob causes read timeout...
I have a table with a schema mostly of small fields. About 30 of them. The primary key is: primary key( bucket, sequence ) … I have 100 buckets and the idea is that sequence is ever increasing. This way I can read from bucket zero, and everything after sequence N and get all the writes ordered by time. I'm running SELECT ... FROM content WHERE bucket=0 AND sequence0 ORDER BY sequence ASC LIMIT 1000; … using the have driver. If I add ALL the fields, except one, so 29 fields, the query is fast. Only 129ms…. However, if I add the 'html' field, which is snapshot of HTML obvious, the query times out… I'm going to add tracing and try to track it down further, but I suspect I'm doing something stupid. Is it going to burn me that the data is UTF8 encoded? I can't image decoding UTF8 is going to be THAT slow but perhaps cassandra is doing something silly under the covers? cqlsh doesn't time out … it actually works fine but it uses 100% CPU while writing out the data so it's not a good comparison unfortunately ception in thread main com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: ...:9042 (com.datastax.driver.core.exceptions.DriverException: Timeout during read)) at com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:65) at com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:256) at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:172) at com.datastax.driver.core.SessionManager.execute(SessionManager.java:92) at com.spinn3r.artemis.robot.console.BenchmarkContentStream.main(BenchmarkContentStream.java:100) Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: dev4.wdc.sl.spinn3r.com/10.24.23.94:9042 (com.datastax.driver.core.exceptions.DriverException: Timeout during read)) at com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:103) at com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:175) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* Skype: *burtonator* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com War is peace. Freedom is slavery. Ignorance is strength. Corporations are people.
Re: Can I call getBytes on a text column to get the raw (already encoded UTF8)
Good idea, bytes are merely processed by the server so you're saving a lot of Cpu. AFAIK getBytes should work fine. Le 24 juin 2014 05:50, Kevin Burton bur...@spinn3r.com a écrit : I'm building a webservice whereby I read the data from cassandra, then write it over the wire. It's going to push LOTS of content, and encoding/decoding performance has really bitten us in the future. So I try to avoid transparent encoding/decoding if I can avoid it. So right now, I have a huge blob of text that's a 'text' column. Logically it *should* be text, because that's what it is... Can I just keep it as text so our normal tools work on it, but get it as raw UTF8 if I call getBytes? This way I can call getBytes and then send it right over the wire as pre-encoded UTF8 data. ... and of course the question is whether it will continue working in the future :-P I'll write a test of it of course but I wanted to see what you guys thought of this idea. -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* Skype: *burtonator* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com War is peace. Freedom is slavery. Ignorance is strength. Corporations are people.
Re: Adding large text blob causes read timeout...
Don't forget that when you do the Select with limit set to 1000, Cassandra is actually fetching 1000 * 29 physical columns (29 fields per logical row). Adding one extra big html column may be too much and cause timeout. Try to: 1. Select only the big html only 2. Or reduce the limit incrementally until no timeout Le 24 juin 2014 06:22, Kevin Burton bur...@spinn3r.com a écrit : I have a table with a schema mostly of small fields. About 30 of them. The primary key is: primary key( bucket, sequence ) … I have 100 buckets and the idea is that sequence is ever increasing. This way I can read from bucket zero, and everything after sequence N and get all the writes ordered by time. I'm running SELECT ... FROM content WHERE bucket=0 AND sequence0 ORDER BY sequence ASC LIMIT 1000; … using the have driver. If I add ALL the fields, except one, so 29 fields, the query is fast. Only 129ms…. However, if I add the 'html' field, which is snapshot of HTML obvious, the query times out… I'm going to add tracing and try to track it down further, but I suspect I'm doing something stupid. Is it going to burn me that the data is UTF8 encoded? I can't image decoding UTF8 is going to be THAT slow but perhaps cassandra is doing something silly under the covers? cqlsh doesn't time out … it actually works fine but it uses 100% CPU while writing out the data so it's not a good comparison unfortunately ception in thread main com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: ...:9042 (com.datastax.driver.core.exceptions.DriverException: Timeout during read)) at com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:65) at com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:256) at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:172) at com.datastax.driver.core.SessionManager.execute(SessionManager.java:92) at com.spinn3r.artemis.robot.console.BenchmarkContentStream.main(BenchmarkContentStream.java:100) Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: dev4.wdc.sl.spinn3r.com/10.24.23.94:9042 (com.datastax.driver.core.exceptions.DriverException: Timeout during read)) at com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:103) at com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:175) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* Skype: *burtonator* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com War is peace. Freedom is slavery. Ignorance is strength. Corporations are people.
com.datastax.driver.core.exceptions.InvalidTypeException: Invalid type for value 1 of CQL type text, expecting class java.lang.String but class [Ljava.lang.Object; provided
I am trying to insert into Cassandra database using Datastax Java driver. But everytime I am getting below exception at `prBatchInsert.bind` line- com.datastax.driver.core.exceptions.InvalidTypeException: Invalid type for value 1 of CQL type text, expecting class java.lang.String but class [Ljava.lang.Object; provided Below is my method which accepts `userId` as the input and `attributes` as the `Map` which contains `key` as my `Column Name` and value as the actual value of that column public void upsertAttributes(final String userId, final MapString, String attributes, final String columnFamily) { try { SetString keys = attributes.keySet(); StringBuilder sqlPart1 = new StringBuilder(); //StringBuilder.append() is faster than concatenating Strings in a loop StringBuilder sqlPart2 = new StringBuilder(); sqlPart1.append(INSERT INTO + columnFamily + (USER_ID ); sqlPart2.append() VALUES ( ?); for (String k : keys) { sqlPart1.append(, +k); //append each key sqlPart2.append(, ?); //append an unknown value for each key } sqlPart2.append() ); //Last parenthesis (and space?) String sql = sqlPart1.toString()+sqlPart2.toString(); CassandraDatastaxConnection.getInstance(); PreparedStatement prBatchInsert = CassandraDatastaxConnection.getSession().prepare(sql); prBatchInsert.setConsistencyLevel(ConsistencyLevel.ONE); // this line is giving me an exception BoundStatement query = prBatchInsert.bind(userId, attributes.values().toArray(new Object[attributes.size()])); //Vararg methods can take an array (might need to cast it to String[]?). CassandraDatastaxConnection.getSession().executeAsync(query); } catch (InvalidQueryException e) { LOG.error(Invalid Query Exception in CassandraDatastaxClient::upsertAttributes +e); } catch (Exception e) { LOG.error(Exception in CassandraDatastaxClient::upsertAttributes +e); } } What wrong I am doing here? Any thoughts?
Re: com.datastax.driver.core.exceptions.InvalidTypeException: Invalid type for value 1 of CQL type text, expecting class java.lang.String but class [Ljava.lang.Object; provided
As the comment in your code suggests, you need to cast the array passed to the bind method as Object[]. This is true anytime you pass an array to a varargs method. On Dec 7, 2013 4:01 PM, Techy Teck comptechge...@gmail.com wrote: I am trying to insert into Cassandra database using Datastax Java driver. But everytime I am getting below exception at `prBatchInsert.bind` line- com.datastax.driver.core.exceptions.InvalidTypeException: Invalid type for value 1 of CQL type text, expecting class java.lang.String but class [Ljava.lang.Object; provided Below is my method which accepts `userId` as the input and `attributes` as the `Map` which contains `key` as my `Column Name` and value as the actual value of that column public void upsertAttributes(final String userId, final MapString, String attributes, final String columnFamily) { try { SetString keys = attributes.keySet(); StringBuilder sqlPart1 = new StringBuilder(); //StringBuilder.append() is faster than concatenating Strings in a loop StringBuilder sqlPart2 = new StringBuilder(); sqlPart1.append(INSERT INTO + columnFamily + (USER_ID ); sqlPart2.append() VALUES ( ?); for (String k : keys) { sqlPart1.append(, +k); //append each key sqlPart2.append(, ?); //append an unknown value for each key } sqlPart2.append() ); //Last parenthesis (and space?) String sql = sqlPart1.toString()+sqlPart2.toString(); CassandraDatastaxConnection.getInstance(); PreparedStatement prBatchInsert = CassandraDatastaxConnection.getSession().prepare(sql); prBatchInsert.setConsistencyLevel(ConsistencyLevel.ONE); // this line is giving me an exception BoundStatement query = prBatchInsert.bind(userId, attributes.values().toArray(new Object[attributes.size()])); //Vararg methods can take an array (might need to cast it to String[]?). CassandraDatastaxConnection.getSession().executeAsync(query); } catch (InvalidQueryException e) { LOG.error(Invalid Query Exception in CassandraDatastaxClient::upsertAttributes +e); } catch (Exception e) { LOG.error(Exception in CassandraDatastaxClient::upsertAttributes +e); } } What wrong I am doing here? Any thoughts?
Re: com.datastax.driver.core.exceptions.InvalidTypeException: Invalid type for value 1 of CQL type text, expecting class java.lang.String but class [Ljava.lang.Object; provided
BoundStatement query = prBatchInsert.bind(userId, attributes.values().toArray(new *String*[attributes.size()])) On 12/07/2013 03:59 PM, Techy Teck wrote: I am trying to insert into Cassandra database using Datastax Java driver. But everytime I am getting below exception at `prBatchInsert.bind` line- com.datastax.driver.core.exceptions.InvalidTypeException: Invalid type for value 1 of CQL type text, expecting class java.lang.String but class [Ljava.lang.Object; provided Below is my method which accepts `userId` as the input and `attributes` as the `Map` which contains `key` as my `Column Name` and value as the actual value of that column public void upsertAttributes(final String userId, final MapString, String attributes, final String columnFamily) { try { SetString keys = attributes.keySet(); StringBuilder sqlPart1 = new StringBuilder(); //StringBuilder.append() is faster than concatenating Strings in a loop StringBuilder sqlPart2 = new StringBuilder(); sqlPart1.append(INSERT INTO + columnFamily + (USER_ID ); sqlPart2.append() VALUES ( ?); for (String k : keys) { sqlPart1.append(, +k); //append each key sqlPart2.append(, ?); //append an unknown value for each key } sqlPart2.append() ); //Last parenthesis (and space?) String sql = sqlPart1.toString()+sqlPart2.toString(); CassandraDatastaxConnection.getInstance(); PreparedStatement prBatchInsert = CassandraDatastaxConnection.getSession().prepare(sql); prBatchInsert.setConsistencyLevel(ConsistencyLevel.ONE); // this line is giving me an exception BoundStatement query = prBatchInsert.bind(userId, attributes.values().toArray(new Object[attributes.size()])); //Vararg methods can take an array (might need to cast it to String[]?). CassandraDatastaxConnection.getSession().executeAsync(query); } catch (InvalidQueryException e) { LOG.error(Invalid Query Exception in CassandraDatastaxClient::upsertAttributes +e); } catch (Exception e) { LOG.error(Exception in CassandraDatastaxClient::upsertAttributes +e); } } What wrong I am doing here? Any thoughts?
RE: cassandra hadoop reducer writing to CQL3 - primary key - must it be text type?
From: johnlu...@hotmail.com To: user@cassandra.apache.org Subject: RE: cassandra hadoop reducer writing to CQL3 - primary key - must it be text type? Date: Wed, 9 Oct 2013 18:33:13 -0400 reduce method : public void reduce(LongWritable writableRecid, IterableLongWritable values, Context context) throws IOException, InterruptedException { Long sum = 0L; Long recordid = writableRecid.get(); ListByteBuffer vbles = null; byte[] longByterray = new byte[8]; for(int i= 0; i 8; i++) { longByterray[i] = (byte)(recordid (i * 8)); } ByteBuffer recordIdByteBuf = ByteBuffer.allocate(8); recordIdByteBuf.wrap(longByterray); keys.put(recordid, recordIdByteBuf); ... context.write(keys, vbles); } I finally got it working after finding the LongSerializer class source in cassandra, I see that the correct way to build a ByteBuffer key from a Long is public ByteBuffer serialize(Long value) { return value == null ? ByteBufferUtil.EMPTY_BYTE_BUFFER : ByteBufferUtil.bytes(value); } John
RE: cassandra hadoop reducer writing to CQL3 - primary key - must it be text type?
I don't know what happened to my original post but it got truncated. Let me try again : software versions : apache-cassandra-2.0.1 hadoop-2.1.0-beta I have been experimenting with using hadoop for a map/reduce operation on cassandra, outputting to the CqlOutputFormat.class. I based my first program fairly closely on the famous WordCount example in examples/hadoop_cql3_word_count except --- I set my output colfamily to have a bigint primary key : CREATE TABLE archive_recordids ( recordid bigint , count_num bigint, PRIMARY KEY (recordid)) and simply tried setting this key as one of the keys in the output map : keys.put(recordid, ByteBufferUtil.bytes(recordid.longValue())); but it always failed with a strange error : java.io.IOException: InvalidRequestException(why:Key may not be empty) After trying to make it more similar to WordCount, I eventually realized the one difference was datatype of the primary key of the output colfamily: WordCount has text I had bigint I changed mine to text : CREATE TABLE archive_recordids ( recordid text , count_num bigint, PRIMARY KEY (recordid)) and set the primary key *twice* in the reducer : keys.put(recordid, ByteBufferUtil.bytes(String.valueOf(recordid))); context.getConfiguration().set(PRIMARY_KEY,String.valueOf(recordid)); and it then worked perfectly. Is there a restriction in cassandra-hadoop-cql support that the output colfamily's primary key(s) must be text? And does that also apply to DELETE? Or am I doing it wrong? Or maybe there is some other OutputFormatter that I could use that would work? Cheers, John
RE: cassandra hadoop reducer writing to CQL3 - primary key - must it be text type?
From: johnlu...@hotmail.com To: user@cassandra.apache.org Subject: RE: cassandra hadoop reducer writing to CQL3 - primary key - must it be text type? Date: Wed, 9 Oct 2013 09:40:06 -0400 software versions : apache-cassandra-2.0.1hadoop-2.1.0-beta I have been experimenting with using hadoop for a map/reduce operation on cassandra, outputting to the CqlOutputFormat.class. I based my first program fairly closely on the famous WordCount example in examples/hadoop_cql3_word_count except --- I set my output colfamily to have a bigint primary key : CREATE TABLE archive_recordids ( recordid bigint , count_num bigint, PRIMARY KEY (recordid)) and simply tried setting this key as one of the keys in the output map : keys.put(recordid, ByteBufferUtil.bytes(recordid.longValue())); but it always failed with a strange error : java.io.IOException: InvalidRequestException(why:Key may not be empty) I managed to get a little bit further and my M/R program now runs to completion with output to the colfamily with bigint primary key and actually does manage to UPDATE a row. query: String query = UPDATE + keyspace + . + OUTPUT_COLUMN_FAMILY + SET count_num = ? ; reduce method : public void reduce(LongWritable writableRecid, IterableLongWritable values, Context context) throws IOException, InterruptedException { Long sum = 0L; Long recordid = writableRecid.get(); ListByteBuffer vbles = null; byte[] longByterray = new byte[8]; for(int i= 0; i 8; i++) { longByterray[i] = (byte)(recordid (i * 8)); } ByteBuffer recordIdByteBuf = ByteBuffer.allocate(8); recordIdByteBuf.wrap(longByterray); keys.put(recordid, recordIdByteBuf); ... context.write(keys, vbles); } and my logger output does show it outputting maps containing what appear to be valid keys e.g. writing key : 0x47407826 , hasarray ? : Y there are about 74 mappings in the final reducer output, each with a different numeric record key. but after the program completes, there is just one single row in the columnfamily with a rowkey of 0 (zero). SELECT * FROM archive_recordids LIMIT 9; recordid | count_num --+--- 0 | 2 (1 rows) I guess it is something relating to the way my code is wrapping along value into the ByteBuffer or maybe the way the ByteBuffer is being allocated. As far as I can tell, the ByteBuffer needs to be populated in exactly the same way as a thrift application would populate a ByteBuffer for a bigint key -- does anyone know how to do that or point me to an example that works? Thanks John Cheers, John
cassandra hadoop reducer writing to CQL3 - primary key - must it be text type?
I have been expermimenting with using hadoop for a map/reduce operation on cassandra, outputting to the CqlOutputFormat.class. I based my first program fairly closely on the famous WordCount example in examples/hadoop_cql3_word_count except --- I set my output colfamily to have a bigint primary key : CREATE TABLE archive_recordids ( recordid bigint , count_num bigint, PRIMARY KEY (recordid)) and simply tried setting this key as one of the keys in the output keys.put(recordid, ByteBufferUtil.bytes(recordid.longValue())); but it always failed with a strange error : java.io.IOException: InvalidRequestException(why:Key may not be empty)
Keystore password in yaml is in plain text
Hi I there a way to obfuscate the keystore/truststore password? Thanks Shahryar --
Problem with sstableloader from text data
Hi, following the article at http://www.datastax.com/dev/blog/bulk-loading , I developed a custom builder app to serialize a text file with rows in json format to a sstable. I managed to get the tool running and building the tables, however when I try to load them I get this error: sstableloader -d localhost demodb/ Exception in thread main java.lang.NullPointerException at org.apache.cassandra.io.sstable.SSTableLoader.init(SSTableLoader.java:64) at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:64) and when I try to decode the sstables to json I get this one: sstable2json demodb/demodb-positions8-jb-1-Data.db [ {key: 000800bae94e08013f188b9bd00400,columns: [Exception in thread main java.lang.IllegalArgumentException at java.nio.Buffer.limit(Buffer.java:267) at org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:55) at org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:64) at org.apache.cassandra.db.marshal.AbstractCompositeType.getString(AbstractCompositeType.java:230) at org.apache.cassandra.tools.SSTableExport.serializeColumn(SSTableExport.java:183) at org.apache.cassandra.tools.SSTableExport.serializeAtom(SSTableExport.java:152) at org.apache.cassandra.tools.SSTableExport.serializeAtoms(SSTableExport.java:140) at org.apache.cassandra.tools.SSTableExport.serializeRow(SSTableExport.java:238) at org.apache.cassandra.tools.SSTableExport.serializeRow(SSTableExport.java:223) at org.apache.cassandra.tools.SSTableExport.export(SSTableExport.java:360) at org.apache.cassandra.tools.SSTableExport.export(SSTableExport.java:382) at org.apache.cassandra.tools.SSTableExport.export(SSTableExport.java:394) at org.apache.cassandra.tools.SSTableExport.main(SSTableExport.java:477) So it seems something is wrong with me streaming the data. These are the relevant parts of the code: This is the pojo to deserialize the json: public class PositionJsonModel { @JsonProperty(iD) private Long idDevice; @JsonProperty(iU) private Long idUnit; @JsonProperty(iE) private Integer idEvent; @JsonProperty(iTE) private Integer idTypeEvent; @JsonProperty(tEv) private String timestampEvent; @JsonProperty(tRx) private String timestampRx; @JsonProperty(mi) private Long mileage; private Long lat; private Long lng; @JsonProperty(A1) private String country; @JsonProperty(A2) private String state; @JsonProperty(A3) private String county; @JsonProperty(A4) private String city; @JsonProperty(A5) private String locality; @JsonProperty(st) private String street; @JsonProperty(cn) private String civnum; @JsonProperty(in) private String info; @JsonProperty(sp) private Integer speed; //getters, setters, tostring ... And this is the main class: BufferedReader reader = new BufferedReader(new FileReader(filename)); String keyspace = demodb; String columnFamily=positions8; File directory = new File(keyspace); if (!directory.exists()) { directory.mkdir(); } Murmur3Partitioner partitioner = new Murmur3Partitioner(); SSTableSimpleUnsortedWriter positionsWriter = new SSTableSimpleUnsortedWriter(directory,partitioner,keyspace,columnFamily, UTF8Type.instance,null,64); String line=; ObjectMapper mapper = new ObjectMapper(); while ((line = reader.readLine()) != null){ long timestamp = System.currentTimeMillis() * 1000; System.out.println(timestamp: +timestamp); PositionJsonModel model= mapper.readValue(line, PositionJsonModel.class); //CREATE TABLE positions8 ( // iddevice bigint, // timestampevent timestamp, // idevent int, // idunit bigint, // status text, // value text, // PRIMARY KEY (iddevice, timestampevent, idevent) //) WITH CLUSTERING ORDER BY (timestampevent DESC, idevent ASC) ListAbstractType? typeList = new ArrayListAbstractType?(); typeList.add(LongType.instance); typeList.add(DateType.instance); typeList.add(IntegerType.instance); CompositeType compositeKeyTypes = CompositeType.getInstance(typeList); Builder cpBuilder= new Builder(compositeKeyTypes); System.out.println(getIdDevice: +model.getIdDevice()); System.out.println(getTimestampEvent: +model.getTimestampEvent()); System.out.println(getIdEvent: +model.getIdEvent()); cpBuilder.add(bytes(model.getIdDevice())); cpBuilder.add(bytes(DateType.dateStringToTimestamp(model.getTimestampEvent
Re: Text searches and free form queries
It works pretty fast. Cool. Just keep an eye out for how big the lucene token row gets. Cheers Indeed, it may get out of hand, but for now we are ok -- for the foreseable future I would say. Should it get larger, I can split it up into rows -- i.e. all tokens that start with a, all tokens that start with b, etc.
Re: Text searches and free form queries
It works pretty fast. Cool. Just keep an eye out for how big the lucene token row gets. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 7/10/2012, at 2:57 AM, Oleg Dulin oleg.du...@gmail.com wrote: So, what I ended up doing is this -- As I write my records into the main CF, I tokenize some fields that I want to search on using Lucene and write an index into a separate CF, such that my columns are a composite of: luceneToken:record key I can then search my records by doing a slice for each lucene token in the search query and then do an intersection of the sets. It works pretty fast. Regards, Oleg On 2012-09-05 01:28:44 +, aaron morton said: AFAIk if you want to keep it inside cassandra then DSE, roll your own from scratch or start with https://github.com/tjake/Solandra . Outside of Cassandra I've heard of people using Elastic Search or Solr which I *think* is now faster at updating the index. Hope that helps. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 4/09/2012, at 3:00 AM, Andrey V. Panov panov.a...@gmail.com wrote: Some one did search on Lucene, but for very fresh data they build search index in memory so data become available for search without delays. On 3 September 2012 22:25, Oleg Dulin oleg.du...@gmail.com wrote: Dear Distinguished Colleagues: -- Regards, Oleg Dulin NYC Java Big Data Engineer http://www.olegdulin.com/
Re: Text searches and free form queries
So, what I ended up doing is this -- As I write my records into the main CF, I tokenize some fields that I want to search on using Lucene and write an index into a separate CF, such that my columns are a composite of: luceneToken:record key I can then search my records by doing a slice for each lucene token in the search query and then do an intersection of the sets. It works pretty fast. Regards, Oleg On 2012-09-05 01:28:44 +, aaron morton said: AFAIk if you want to keep it inside cassandra then DSE, roll your own from scratch or start with https://github.com/tjake/Solandra . Outside of Cassandra I've heard of people using Elastic Search or Solr which I *think* is now faster at updating the index. Hope that helps. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 4/09/2012, at 3:00 AM, Andrey V. Panov panov.a...@gmail.com wrote: Some one did search on Lucene, but for very fresh data they build search index in memory so data become available for search without delays. On 3 September 2012 22:25, Oleg Dulin oleg.du...@gmail.com wrote: Dear Distinguished Colleagues: -- Regards, Oleg Dulin NYC Java Big Data Engineer http://www.olegdulin.com/
Re: Text searches and free form queries
AFAIk if you want to keep it inside cassandra then DSE, roll your own from scratch or start with https://github.com/tjake/Solandra . Outside of Cassandra I've heard of people using Elastic Search or Solr which I *think* is now faster at updating the index. Hope that helps. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 4/09/2012, at 3:00 AM, Andrey V. Panov panov.a...@gmail.com wrote: Some one did search on Lucene, but for very fresh data they build search index in memory so data become available for search without delays. On 3 September 2012 22:25, Oleg Dulin oleg.du...@gmail.com wrote: Dear Distinguished Colleagues:
Text searches and free form queries
Dear Distinguished Colleagues: I need to add full-text search and somewhat free form queries to my application. Our data is made up of items that are stored in a single column family, and we have a bunch of secondary indices for look ups. An item has header fields and data fields, and the structure of the items CF is a super column family with row-key being item's natural ID, super column for header, super column for data. Our application is made up of a several redundant/load balanced servers all pointing at a Cassandra cluster. Our servers run embedded Jetty. I need to be able to find items by a combination of field values. Currently I have an index for items by field value which works reasonably well. I could also add support for data types and index items by fields of appropriate types, so we can do range queries on items. Ultimately, though, what we want is full text search with suggestions and human language sensitivity. We want to search by date ranges, by field values, etc. I did some homework on this topic, and here is what I see as options: 1) Use an SQL database as a helper. This is rather clunky, not sure what it gets us since just about anything that can be done in SQL can be done in Cassandra with proper structures. Then the problem here also is where am I going to get an open source database that can handle the workload ? Probably nowhere, nor do I get natural language support. 2) Each of our servers can index data using Lucene, but again we have to come up with a clunky mechanism where either one of the servers does the indexing and results are replicated, or each server does its own indexing. 3) We can use Solr as is, perhaps with some small modifications it can run within our server JVM -- since we already run embedded Jetty. I like this idea, actually, but I know that Solr indexing doesn't take advantage of Cassandra. 4) Datastax Enterprise with search, presumably, supports Solr indexing of existing column families -- but for the life of me I couldn't figure out how exactly it does that. The Wikipedia example shows that Solr can create column families based on Solr schemas that I can then query using Cassandra itself (which is great) and supposedly I can modify those column families directly and Solr will reindex them (which is even better), but I am not sure how that fits into our server design. The other concern is locking in to a commercial product, something I am very much worried about. So, one possibility I can see is using Solr embedded within our own server solution but storing its indexes in the file system outside of Cassandra. This is not optimal, and maybe over time i can add my own support for storing Solr index in Cassandra w/o relying on the Datastax solution. In any case, what are your thoughts and experiences ? Regards, Oleg
Re: Text searches and free form queries
Some one did search on Lucene, but for very fresh data they build search index in memory so data become available for search without delays. On 3 September 2012 22:25, Oleg Dulin oleg.du...@gmail.com wrote: Dear Distinguished Colleagues:
Online text search with Hadoop/Brisk
I keep reading that Hadoop/Brisk is not suitable for online querying, only for offline/batch processing. What exactly are the reasons it is unsuitable? My use case is a fairly high query load, and each query ideally would return within about 20 seconds. The queries will use indexes to narrow down the result set first, but they also need to support text search on one of the fields. I was thinking of simulating the SQL LIKE statement, by running each query as a MapReduce job so that the text search gets distributed between nodes. I know the recommended approach is to keep a seperate full-text index, but that could be quite space-intensive, and also means you can only search on complete words. Any thoughts on this approach? Thanks, Ben
Re: Online text search with Hadoop/Brisk
On Wed, May 11, 2011 at 11:19 AM, Ben Scholl brsch...@gmail.com wrote: I keep reading that Hadoop/Brisk is not suitable for online querying, only for offline/batch processing. What exactly are the reasons it is unsuitable? My use case is a fairly high query load, and each query ideally would return within about 20 seconds. The queries will use indexes to narrow down the result set first, but they also need to support text search on one of the fields. I was thinking of simulating the SQL LIKE statement, by running each query as a MapReduce job so that the text search gets distributed between nodes. I know the recommended approach is to keep a seperate full-text index, but that could be quite space-intensive, and also means you can only search on complete words. Any thoughts on this approach? Thanks, Ben Brisk was made to me a tight integration of Cassandra Hadoop and Hive. If you are looking to full text searches you should look at Solandra, https://github.com/tjake/Solandra, which is an Cassandra backend for the Solr/Lucene indexes. Edward
Re: What would be a good strategy for Storing the large text contents like blog posts in Cassandra.
On 03/07/2011 10:08 PM, Aaron Morton wrote: You can fill your boots. So long as your boots have a capacity of 2 billion. Background ... http://wiki.apache.org/cassandra/LargeDataSetConsiderations http://wiki.apache.org/cassandra/CassandraLimitations http://www.pcworld.idg.com.au/article/373483/new_cassandra_can_pack_two_billion_columns_into_row/ Thx, I haven't seen these wiki pages. -- Jean-Christophe Sirot
Re: What would be a good strategy for Storing the large text contents like blog posts in Cassandra.
Hello, On 03/06/2011 06:35 PM, Aditya Narayan wrote: Next, I also need to store the blogComments which I am planning to store all, in another single row. 1 comment per column. Thus the entire information about the a single comment like commentBody, commentor would be serialized(using google Protocol buffers) and stored in a single column, Is there any limitation/issue in having a signle row with a lot of columns? For instance, can I have millions of columns in a single row? -- Jean-Christophe Sirot
Re: What would be a good strategy for Storing the large text contents like blog posts in Cassandra.
You can fill your boots. So long as your boots have a capacity of 2 billion. Background ... http://wiki.apache.org/cassandra/LargeDataSetConsiderations http://wiki.apache.org/cassandra/CassandraLimitations http://www.pcworld.idg.com.au/article/373483/new_cassandra_can_pack_two_billion_columns_into_row/ aaron On 8/03/2011, at 4:57 AM, Jean-Christophe Sirot jean-christophe.si...@cryptolog.com wrote: Hello, On 03/06/2011 06:35 PM, Aditya Narayan wrote: Next, I also need to store the blogComments which I am planning to store all, in another single row. 1 comment per column. Thus the entire information about the a single comment like commentBody, commentor would be serialized(using google Protocol buffers) and stored in a single column, Is there any limitation/issue in having a signle row with a lot of columns? For instance, can I have millions of columns in a single row? -- Jean-Christophe Sirot
What would be a good strategy for Storing the large text contents like blog posts in Cassandra.
What would be a good strategy to store large text content/(blog posts of around 1500-3000 characters) in cassandra? I need to store these blog posts along with their metadata like bloggerId, blogTags. I am looking forward to store this data in a single row giving each attribute a single column. So one blog per row. Is using a single column for a large blog post like this a good strategy? Next, I also need to store the blogComments which I am planning to store all, in another single row. 1 comment per column. Thus the entire information about the a single comment like commentBody, commentor would be serialized(using google Protocol buffers) and stored in a single column, For storing the no. of likes of each comment itself, I am planning to keep a counter_column, in the same row, for each comment that will hold an no. specifiying no. of 'likes' of that comment. Any suggestions on the above design highly appreciated.. Thanks.
Re: What would be a good strategy for Storing the large text contents like blog posts in Cassandra.
Sounds reasonable, one CF for the blog post one CF for the comments. You could also use a single CF if you will often read the blog and the comments at the same time. The best design is the one that suits how your app works, try one and be prepared to change. Note that counters are only in the 0.8 trunk and are still under development, they are not going to be released for a couple of months. Your per column data size is nothing to be concerned abut. Hope that helps. Aaron On 7/03/2011, at 6:35 AM, Aditya Narayan ady...@gmail.com wrote: What would be a good strategy to store large text content/(blog posts of around 1500-3000 characters) in cassandra? I need to store these blog posts along with their metadata like bloggerId, blogTags. I am looking forward to store this data in a single row giving each attribute a single column. So one blog per row. Is using a single column for a large blog post like this a good strategy? Next, I also need to store the blogComments which I am planning to store all, in another single row. 1 comment per column. Thus the entire information about the a single comment like commentBody, commentor would be serialized(using google Protocol buffers) and stored in a single column, For storing the no. of likes of each comment itself, I am planning to keep a counter_column, in the same row, for each comment that will hold an no. specifiying no. of 'likes' of that comment. Any suggestions on the above design highly appreciated.. Thanks.
Re: What would be a good strategy for Storing the large text contents like blog posts in Cassandra.
Thanks Aaron!! I didnt knew about the upcoming facility for inbuilt counters. This sounds really great for my use-case!! Could you let me know where can I read more about this, if this had been blogged about, somewhere ? I'll go forward with the one (entire)blog per column design. Thanks On Mon, Mar 7, 2011 at 5:10 AM, Aaron Morton aa...@thelastpickle.com wrote: Sounds reasonable, one CF for the blog post one CF for the comments. You could also use a single CF if you will often read the blog and the comments at the same time. The best design is the one that suits how your app works, try one and be prepared to change. Note that counters are only in the 0.8 trunk and are still under development, they are not going to be released for a couple of months. Your per column data size is nothing to be concerned abut. Hope that helps. Aaron On 7/03/2011, at 6:35 AM, Aditya Narayan ady...@gmail.com wrote: What would be a good strategy to store large text content/(blog posts of around 1500-3000 characters) in cassandra? I need to store these blog posts along with their metadata like bloggerId, blogTags. I am looking forward to store this data in a single row giving each attribute a single column. So one blog per row. Is using a single column for a large blog post like this a good strategy? Next, I also need to store the blogComments which I am planning to store all, in another single row. 1 comment per column. Thus the entire information about the a single comment like commentBody, commentor would be serialized(using google Protocol buffers) and stored in a single column, For storing the no. of likes of each comment itself, I am planning to keep a counter_column, in the same row, for each comment that will hold an no. specifiying no. of 'likes' of that comment. Any suggestions on the above design highly appreciated.. Thanks.
RE: How can I implement text based searching for the data/entities/items stored in Cassandra ?
You can use: http://code.google.com/p/kundera/ to search text. it provides a way to search by any key over Cassandra. I guess, nothing inbuilt is in place for this. Vivek From: rajkumar@gmail.com [rajkumar@gmail.com] on behalf of Aklin_81 [asdk...@gmail.com] Sent: 12 February 2011 17:27 To: user Subject: How can I implement text based searching for the data/entities/items stored in Cassandra ? I would like to text search for some of Entities/items stored in the database through an AJAX powered application...Such that the user starts typing and he can get the hints suggested items. This is implemented in SQL databases using the LIKE, is it possible to anyhow implement this in an application powered by cassandra ? How do I go forward to implement this feature, very much required for my case? Would I have to consider a MySQL DB for implementing this particular feature there, and rest in Cassandra ? Thanks -Asil Impetus to Present Big Data -- Analytics Solutions and Strategies at TDWI World Conference (Feb 13-18) in Las Vegas.We are also bringing cloud experts together at CloudCamp, Delhi on Feb 12. CloudCamp is an unconference where early adopters of Cloud Computing technologies exchange ideas. Click http://www.impetus.com to know more. NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
RE: How can I implement text based searching for the data/entities/items stored in Cassandra ?
Addtionally you can use cassandra indexes for specific search. From: Vivek Mishra [vivek.mis...@impetus.co.in] Sent: 12 February 2011 17:38 To: user@cassandra.apache.org Subject: RE: How can I implement text based searching for the data/entities/items stored in Cassandra ? You can use: http://code.google.com/p/kundera/ to search text. it provides a way to search by any key over Cassandra. I guess, nothing inbuilt is in place for this. Vivek From: rajkumar@gmail.com [rajkumar@gmail.com] on behalf of Aklin_81 [asdk...@gmail.com] Sent: 12 February 2011 17:27 To: user Subject: How can I implement text based searching for the data/entities/items stored in Cassandra ? I would like to text search for some of Entities/items stored in the database through an AJAX powered application...Such that the user starts typing and he can get the hints suggested items. This is implemented in SQL databases using the LIKE, is it possible to anyhow implement this in an application powered by cassandra ? How do I go forward to implement this feature, very much required for my case? Would I have to consider a MySQL DB for implementing this particular feature there, and rest in Cassandra ? Thanks -Asil Impetus to Present Big Data -- Analytics Solutions and Strategies at TDWI World Conference (Feb 13-18) in Las Vegas.We are also bringing cloud experts together at CloudCamp, Delhi on Feb 12. CloudCamp is an unconference where early adopters of Cloud Computing technologies exchange ideas. Click http://www.impetus.com to know more. NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference. Impetus to Present Big Data -- Analytics Solutions and Strategies at TDWI World Conference (Feb 13-18) in Las Vegas.We are also bringing cloud experts together at CloudCamp, Delhi on Feb 12. CloudCamp is an unconference where early adopters of Cloud Computing technologies exchange ideas. Click http://www.impetus.com to know more. NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
Re: How can I implement text based searching for the data/entities/items stored in Cassandra ?
There is/are lucandra/solandra: https://github.com/tjake/Lucandra -- Shaun On Feb 12, 2011, at 6:57 AM, Aklin_81 wrote: I would like to text search for some of Entities/items stored in the database through an AJAX powered application...Such that the user starts typing and he can get the hints suggested items. This is implemented in SQL databases using the LIKE, is it possible to anyhow implement this in an application powered by cassandra ? How do I go forward to implement this feature, very much required for my case? Would I have to consider a MySQL DB for implementing this particular feature there, and rest in Cassandra ? Thanks -Asil