Re: Oracle: unexpected operator

2019-05-10 Thread Jeff Jirsa
Probably a jvm check and a new output that didn’t exist previously - which 
version of the jdk? Also are you using a weird shell or exporting weird java 
home or aliases? 

-- 
Jeff Jirsa


> On May 10, 2019, at 12:16 PM, Michael Shuler  wrote:
> 
>> On 5/10/19 2:00 PM, Lou DeGenaro wrote:
>> cassandra-server/bin$ ./nodetool help
>> ./nodetool: 333: [: Oracle: unexpected operator
>> usage: nodetool [(-u  | --username )]
>> ...
> 
> What version of Cassandra? That line number is strange. The string "Oracle" 
> along with other garbage seems to be coming from possibly cassandra-env.sh 
> version check? I would check what changes you've made to these files - 
> something isn't right, nor the same as what was originally provided.
> 
> (cassandra-3.11)mshuler@hana:~/git/cassandra$ wc -l bin/nodetool
> 115 bin/nodetool
> 
> -- 
> Kind regards,
> Michael
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Oracle: unexpected operator

2019-05-10 Thread Michael Shuler

On 5/10/19 2:00 PM, Lou DeGenaro wrote:

cassandra-server/bin$ ./nodetool help
./nodetool: 333: [: Oracle: unexpected operator
usage: nodetool [(-u  | --username )]
...


What version of Cassandra? That line number is strange. The string 
"Oracle" along with other garbage seems to be coming from possibly 
cassandra-env.sh version check? I would check what changes you've made 
to these files - something isn't right, nor the same as what was 
originally provided.


(cassandra-3.11)mshuler@hana:~/git/cassandra$ wc -l bin/nodetool
115 bin/nodetool

--
Kind regards,
Michael

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Oracle: unexpected operator

2019-05-10 Thread Lou DeGenaro
cassandra-server/bin$ ./nodetool help
./nodetool: 333: [: Oracle: unexpected operator
usage: nodetool [(-u  | --username )]
...

Why might I get the above "unexpected error"?

Thx.

Lou.


Re: Corrupted sstables

2019-05-10 Thread Alain RODRIGUEZ
Hello Roy,

The name of the table makes me think that you might be doing automated
changes to the schema. I just dug this topic for someone else and schema
changes are way less consistent than standard Cassandra operations (see
https://issues.apache.org/jira/browse/CASSANDRA-10699).

> sessions_rawdata/sessions_v2_2019_05_06-9cae0c20585411e99aa867a11519e31c/md-816-big-I
>
>
Idea 1: Some of these queries might have failed for multiple reasons on a
node (down for too long, race conditions, ...), leaving the cluster in an
unstable state where there is a schema disagreement. In that case, you
could have troubles when adding a new node I have seen it happening. Could
you check/share with us the output of: 'nodetool describecluster'?

Also did you tried recently to perform a rolling restart? This often helps
synchronising local schemas and 'could' fix the issue. Another option is
'nodetool resetlocalschema' on node(s) out of sync.

idea 2: If you identified that you have broken second indexes, maybe give a
try at running 'nodetool rebuild_index   ' on
all nodes before adding the next node?
https://cassandra.apache.org/doc/latest/tools/nodetool/rebuild_index.html

Hope this helps,
C*heers,
---
Alain Rodriguez - al...@thelastpickle.com
France / Spain

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com



Le jeu. 9 mai 2019 à 17:29, Jason Wee  a écrit :

> maybe print out value into the logfile and that should lead to some
> clue where it might be the problem?
>
> On Tue, May 7, 2019 at 4:58 PM Paul Chandler  wrote:
> >
> > Roy, We spent along time trying to fix it, but didn’t find a solution,
> it was a test cluster, so we ended up rebuilding the cluster, rather than
> spending anymore time trying to fix the corruption. We have worked out what
> had caused it, so were happy it wasn’t going to occur in production. Sorry
> that is not much help, but I am not even sure it is the same issue you have.
> >
> > Paul
> >
> >
> >
> > On 7 May 2019, at 07:14, Roy Burstein  wrote:
> >
> > I can say that it happens now as well ,currently no node has been
> added/removed .
> > Corrupted sstables are usually the index files and in some machines the
> sstable even does not exist on the filesystem.
> > On one machine I was able to dump the sstable to dump file without any
> issue  . Any idea how to tackle this issue ?
> >
> >
> > On Tue, May 7, 2019 at 12:32 AM Paul Chandler  wrote:
> >>
> >> Roy,
> >>
> >> I have seen this exception before when a column had been dropped then
> re added with the same name but a different type. In particular we dropped
> a column and re created it as static, then had this exception from the old
> sstables created prior to the ddl change.
> >>
> >> Not sure if this applies in your case.
> >>
> >> Thanks
> >>
> >> Paul
> >>
> >> On 6 May 2019, at 21:52, Nitan Kainth  wrote:
> >>
> >> can Disk have bad sectors? fccheck or something similar can help.
> >>
> >> Long shot: repair or any other operation conflicting. Would leave that
> to others.
> >>
> >> On Mon, May 6, 2019 at 3:50 PM Roy Burstein 
> wrote:
> >>>
> >>> It happens on the same column families and they have the same ddl (as
> already posted) . I did not check it after cleanup
> >>> .
> >>>
> >>> On Mon, May 6, 2019, 23:43 Nitan Kainth  wrote:
> 
>  This is strange, never saw this. does it happen to same column family?
> 
>  Does it happen after cleanup?
> 
>  On Mon, May 6, 2019 at 3:41 PM Roy Burstein 
> wrote:
> >
> > Yes.
> >
> > On Mon, May 6, 2019, 23:23 Nitan Kainth 
> wrote:
> >>
> >> Roy,
> >>
> >> You mean all nodes show corruption when you add a node to cluster??
> >>
> >>
> >> Regards,
> >> Nitan
> >> Cell: 510 449 9629
> >>
> >> On May 6, 2019, at 2:48 PM, Roy Burstein 
> wrote:
> >>
> >> It happened  on all the servers in the cluster every time I have
> added node
> >> .
> >> This is new cluster nothing was upgraded here , we have a similar
> cluster
> >> running on C* 2.1.15 with no issues .
> >> We are aware to the scrub utility just it reproduce every time we
> added
> >> node to the cluster .
> >>
> >> We have many tables there
> >>
> >>
> >
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>


Re: Schema Management Best Practices

2019-05-10 Thread Alain RODRIGUEZ
Hello Mark

Second,  any ideas what could be creating bottlenecks for schema alteration?


I am not too sure what could be going on to make things that long, but
about the corrupted data, I've seen it before. Here are some thoughts
around schema changes and finding the bottlenecks:

Ideally, use commands to make sure there is no disagreement before moving
with the next schema change. Worst case, *slow down *the path of the
changes. The schema changes queries are not designed to be performant not
to be used asynchronously or so many times in a row. Thus they should and
cannot be used as standard queries hammering the cluster, as of now and
imho. For design reasons, it's not good to run asynchronous / fast path
'alter table' queries. More information about current design issues and
incoming improvement is available on Jira:
https://issues.apache.org/jira/browse/CASSANDRA-9449,
https://issues.apache.org/jira/browse/CASSANDRA-9425,
https://issues.apache.org/jira/browse/CASSANDRA-10699, ...

You might still improve speed if you want to find the bottleneck, use
monitoring dashboards or nodetool:
- 'nodetool describecluster' - show node's schema version.
- 'watch -d nodetool tpstats' --> look for pending / dropped operations
- Check your GC performances if you see a lot of/big GC pauses (maybe
https://gceasy.io might help you there).
- Check the logs (for warn/error - missing columns or mismatching schema)
and the system.schema_column_families to dig deeper and see what each nodes
have as a source of truth.
I hope you'll find some clues you can investigate further with one of those
^.

Also, 'nodetool resetlocalschema' could help if some nodes are stuck with
an old schema version:
http://cassandra.apache.org/doc/latest/tools/nodetool/resetlocalschema.html.
Often a rolling restart also does the trick. If you would need more
specific help it would be good to share the version you are using so we
know where to look at and also to have an idea of the number of nodes could
help.

Some of this stuff is shown here:
https://docs.datastax.com/en/cql/3.3/cql/cql_using/useCreateTableCollisionFix.html.
The existence of this kind of document show that this is a common issue and
it even says clearly: 'Dynamic schema creation or updates can cause schema
collision resulting in errors.'

So for know, I would move slowly, probably automating a check of the output
of 'nodetool describecluster' to make sure the schema was spread before
going for the next mutation.

the number of "alter table" statements was quite large (300+).
>

I must admit I don't know the 'alter table' path well internally, but I
think this is a lot of changes and that it's not designed to happen
quickly. Slow it down and add control procedure in the path I would say.

First, is cqlsh the best way to handle these types of loads?


Yes, I see no problem with that. It could also be through any other
Cassandra client. Maybe they would be faster. I never had to do so many
changes at once :). You can give python or Java a try for this work I
guess. In that case use synchronous requests and automate checks I would
say, to definitely stay away of race conditions / data corruption.


>  Our DBAs report that even under normal conditions they send alter table
> statements in small chunks or else the will see load times of 20-45 minutes.


I also often noticed schema changes take some time, but I did not mind
much. Maybe the comments above that should hopefully keep you away from
race conditions or the use of some other client (Java instead of cqlsh
let's say) might help. I guess you could give it a try.

I would definitely start by reading more (Jira/doc/code) about how
Cassandra perform those changes, if I had to do this kind of batch of
changes, because schema changes do not seem to be as safe and efficient as
most of Cassandra internals are nowadays (for mainstream features, not
counting MVs, Indexes, triggers, etc). This common feature that is to make
multiple changes to your data model quickly should be handled with care and
understanding in Cassandra and for now I would say.

I hope some of the above might be useful to you,

C*heers,
---
Alain Rodriguez - al...@thelastpickle.com
France / Spain

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

Le jeu. 9 mai 2019 à 13:40, Mark Bidewell  a écrit :

> I am doing post-mortem on an issue with our cassandra cluster.  One of our
> tables became corrupt and had to be restored via a backup.  The table
> schema has been undergoing active development, so the number of "alter
> table" statements was quite large (300+).  Currently, we use cqlsh to do
> schema loads.  During the restore, the schema load alone took about 4 hours.
>
> Our DBAs report that even under normal conditions they send alter table
> statements in small chunks or else the will see load times of 20-45 minutes.
>
> My question is two part.  First, is cqlsh the best way to handle these
> types of loads?  Second,  an

Re: datacorruption with cassandra 2.1.11

2019-05-10 Thread keshava
i will try with changing java version.
w.r.t other point about hardware, i have this issue in multiple setups. so
i really doubt if hardware is playing spoilsport here

On 10-May-2019 11:38, "Jeff Jirsa"  wrote:

> It’s going to be very difficult to diagnose remotely.
>
> I don’t run or have an opinion on jdk7 but I would suspect the following:
>
> - bad hardware (dimm, disk, network card,  motherboard, processor in the
> order)
> - bad jdk7. I’d be inclined to upgrade to 8 personally, but rolling back
> to previous version may not be a bad idea
>
>
> You’re in a tough spot if this is spreading. I’d personally be looking to
> try to isolate the source and roll forward or backward as quickly as
> possible. I don’t really suspect a cassandra 2.1 but here but it’s possible
> I suppose. Take a snapshot now as you may need it to try to recover data
> later.
>
>
> --
> Jeff Jirsa
>
>
> On May 9, 2019, at 10:53 PM, keshava  wrote:
>
> yes we do have compression enabled using 
> "org.apache.cassandra.io.compress.LZ4Compressor"
> it is spreading..
> as the no of inserts increases it is spreading across.
> yes it did started with JDK and OS upgrade.
>
> Best regards  :)
> keshava Hosahalli
>
>
> On Thu, May 9, 2019 at 7:11 PM Jeff Jirsa  wrote:
>
>> Do you have compression enabled on your table?
>>
>> Did this start with the JDK upgrade?
>>
>> Is the compression spreading, or is it contained to the same % of
>> entries?
>>
>>
>>
>> On Thu, May 9, 2019 at 4:12 AM keshava  wrote:
>>
>>> Hi , our application is running in to data corruption issue. Application
>>> uses cassandra 2.1.11 with datastax java driver version 2.1.9. So far all
>>> working fine. recently we changed our deployment environment to openjdk
>>> 1.7.191 (earlier it was 1.7.181) and centos 7.4 (earlier 6.8) This is
>>> randomly happening for one table. 1 in every 4-5 entries are getting
>>> corrupted writing new entries will return success and when i try to read i
>>> get data  not found .whenist all the data available in the table using
>>> cqlsh i see garbage entries like
>>>
>>> \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\
>>> x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
>>>
>>> here is the output of the cqlsh
>>>
>>> cqlsh:ccp> select id from socialcontact;
>>>
>>> id 
>>> -->
>>>
>>> 9BA31AE3116A097C3F57FEF9 9BA10FB2116A00103F57FEF9
>>> \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\
>>> x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
>>> \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\
>>> x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
>>> \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\
>>> x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
>>> \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\
>>> x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
>>> 9BA3236C116A09E63F57FEF9 9BA32536116A09FC3F57FEF9
>>> \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\
>>> x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
>>> \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\
>>> x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00.
>>>
>>> I did enabled the query tracing on both cassandra server and driver.
>>> didn't noticed any differences. looking for any advice's in resolving this
>>> issue
>>>
>>> PS i did tried upgrading cassandra to latest in 2.1 train but it didn't
>>> help
>>>
>>>