from:"Alex Liu \(JIRA\)"

[jira] [Comment Edited] (CASSANDRA-15141) Faster token ownership calculation for NetworkTopologyStrategy

2019-11-22 Thread Alex Liu (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-15141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980297#comment-16980297
 ] 

Alex Liu edited comment on CASSANDRA-15141 at 11/22/19 4:28 PM:


How does  
{{[getAddressReplicas()|https://github.com/apache/cassandra/blob/7df67eff2d66dba4bed2b4f6aeabf05144d9b057/src/java/org/apache/cassandra/service/StorageService.java#L3002]
 block heartbeat propagation?}}


was (Author: alexliu68):
How does  
{{[getAddressReplicas()|https://github.com/apache/cassandra/blob/7df67eff2d66dba4bed2b4f6aeabf05144d9b057/src/java/org/apache/cassandra/service/StorageService.java#L3002]
 blocks heartbeat propagation?}}

> Faster token ownership calculation for NetworkTopologyStrategy
> --
>
> Key: CASSANDRA-15141
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15141
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Cluster/Gossip, Cluster/Membership
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Normal
>
> This function 
> [{{getAddressReplicas()}}|https://github.com/apache/cassandra/blob/7df67eff2d66dba4bed2b4f6aeabf05144d9b057/src/java/org/apache/cassandra/service/StorageService.java#L3002]
>  during removenode and decommission is slow for large vnode cluster with 
> NetworkTopologyStrategy. As it needs to build whole replications map for 
> every token range.
> In one of our cluster (> 1k nodes), it takes about 20 seconds for each 
> NetworkTopologyStrategy keyspace, so the total time to process a removenode 
> message takes at least 80 seconds (20 * 4: 3 system keyspaces, 1 user 
> keyspace). It blocks the heartbeat propagation and causes false down node.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-15141) Faster token ownership calculation for NetworkTopologyStrategy

2019-11-22 Thread Alex Liu (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-15141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980297#comment-16980297
 ] 

Alex Liu commented on CASSANDRA-15141:
--

How does  
{{[getAddressReplicas()|https://github.com/apache/cassandra/blob/7df67eff2d66dba4bed2b4f6aeabf05144d9b057/src/java/org/apache/cassandra/service/StorageService.java#L3002]
 blocks heartbeat propagation?}}

> Faster token ownership calculation for NetworkTopologyStrategy
> --
>
> Key: CASSANDRA-15141
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15141
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Cluster/Gossip, Cluster/Membership
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Normal
>
> This function 
> [{{getAddressReplicas()}}|https://github.com/apache/cassandra/blob/7df67eff2d66dba4bed2b4f6aeabf05144d9b057/src/java/org/apache/cassandra/service/StorageService.java#L3002]
>  during removenode and decommission is slow for large vnode cluster with 
> NetworkTopologyStrategy. As it needs to build whole replications map for 
> every token range.
> In one of our cluster (> 1k nodes), it takes about 20 seconds for each 
> NetworkTopologyStrategy keyspace, so the total time to process a removenode 
> message takes at least 80 seconds (20 * 4: 3 system keyspaces, 1 user 
> keyspace). It blocks the heartbeat propagation and causes false down node.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-11553) hadoop.cql3.CqlRecordWriter does not close cluster on reconnect

2016-04-12 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237398#comment-15237398
 ] 

Alex Liu commented on CASSANDRA-11553:
--

+1, cluster close automatically close all session, so the session close is 
redundant.

> hadoop.cql3.CqlRecordWriter does not close cluster on reconnect
> ---
>
> Key: CASSANDRA-11553
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11553
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Artem Aliev
>Assignee: Artem Aliev
> Fix For: 2.2.6, 3.5, 3.6, 3.0.6
>
> Attachments: CASSANDRA-11553-2.2.txt
>
>
> CASSANDRA-10058 add session and cluster close to all places in hadoop except 
> one place on reconnection.
> The writer uses one connection per new cluster, so I added cluster.close() 
> call to sesseionClose() method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-11243) Memory LEAK CqlInputFormat

2016-02-29 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15172084#comment-15172084
 ] 

Alex Liu commented on CASSANDRA-11243:
--

DSE test on Cassandra 3.0.3 version doesn't have this issue. It could be 
something introduced after 3.0.3.

> Memory LEAK CqlInputFormat
> --
>
> Key: CASSANDRA-11243
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11243
> Project: Cassandra
>  Issue Type: Bug
> Environment: Ubuntu 14.04.04 LTS
> Hadoop 2.7
> Cassandra 3.3
>Reporter: Matteo Zuccon
>
> Error: "util.ResourceLeakDetector: LEAK: You are creating too many 
> HashedWheelTimer instances.  HashedWheelTimer is a shared resource that must 
> be reused across the JVM,so that only a few instances are created"
> Using CqlInputFormat.Class as input format for an Hadoop Mapreduce program 
> (on distributed Hadoop Cluster) gives a memory leak error.
> Version of the library used:
> 
>   org.apache.cassandra
>   cassandra-all
>   3.3
> 
> The same jar is working on a single node Hadoop configuration, the memory 
> leak error show up in the cluster hadoop configuration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-11001) Hadoop integration is incompatible with Cassandra Driver 3.0.0

2016-01-12 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15094292#comment-15094292
 ] 

Alex Liu commented on CASSANDRA-11001:
--

It should be OK.

> Hadoop integration is incompatible with Cassandra Driver 3.0.0
> --
>
> Key: CASSANDRA-11001
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11001
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>
> When using Hadoop input format with SSL and Cassandra Driver 3.0.0-beta1, we 
> hit the following exception:
> {noformat}
> Exception in thread "main" java.lang.NoSuchFieldError: 
> DEFAULT_SSL_CIPHER_SUITES
>   at 
> org.apache.cassandra.hadoop.cql3.CqlConfigHelper.getSSLOptions(CqlConfigHelper.java:548)
>   at 
> org.apache.cassandra.hadoop.cql3.CqlConfigHelper.getCluster(CqlConfigHelper.java:315)
>   at 
> org.apache.cassandra.hadoop.cql3.CqlConfigHelper.getInputCluster(CqlConfigHelper.java:298)
>   at 
> org.apache.cassandra.hadoop.cql3.CqlInputFormat.getSplits(CqlInputFormat.java:131)
> {noformat}
> Should this be fixed with reflection so that Hadoop input/output formats are 
> compatible with both old and new driver?
> [~jjordan], [~alexliu68] ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-10837) Cluster/session should be closed in Cassandra Hadoop Input/Output classes

2015-12-16 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-10837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060238#comment-15060238
 ] 

Alex Liu commented on CASSANDRA-10837:
--

+1, The patch can't be applied to my testing environment for it's used in a 
slightly different branch. The patches looks good to me, some simple testing 
should be enough to verify it.

> Cluster/session should be closed in Cassandra Hadoop Input/Output classes
> -
>
> Key: CASSANDRA-10837
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10837
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
>Reporter: Alex Liu
>Assignee: Alex Liu
> Fix For: 3.0.x
>
> Attachments: 10837-3.0-branch.txt, 10837-3.0-v4.txt, 
> 10837-v2-3.0-branch.txt, 10837-v3-3.0-branch.txt
>
>
> See a lot of following warnings during Hadoop job running
> {code}
> ERROR 11:37:45 LEAK: You are creating too many HashedWheelTimer instances.  
> HashedWheelTimer is a shared resource that must be reused across the JVM,so 
> that only a few instances are created.
> {code}
> Each cluster/session needs be closed and a shared HashedWheelTimer may reduce 
> the resource leakage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-10837) Cluster/session should be closed in Cassandra Hadoop Input/Output classes

2015-12-15 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-10837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058870#comment-15058870
 ] 

Alex Liu edited comment on CASSANDRA-10837 at 12/15/15 9:32 PM:


The changes to CqlRecordWriter seems wrong to me.
The Cluster and the Session instances were both properly managed by the 
try-with-resources statement. In the patch version only the Cluster instance is 
closed.

https://github.com/datastax/java-driver/blob/2.1/driver-core/src/main/java/com/datastax/driver/core/Cluster.java
Cluster doesn't implement AutoClosable interface, so it has to be closed 
manually. Closing cluster will close all session objects associated with it. 

Making the Cluster an instance variable in NativeRingCache will trigger an 
error when the write method will be called, as the Cluster has been closed at 
the end of the constructor.
=
NativeRingCache uses the cluster to get the metadata in the constructor when 
cluster object is still open. once the initialization of constructor is done, 
the cluster object is not used by NativeRingCache anymore.

In CqlInputFormat could you use try-with-resources for both Cluster and Session 
instances. I think it is best to do things properly by closing both of them.

Similarly closing cluster object auto-close all sessions objects.


was (Author: alexliu68):
The changes to CqlRecordWriter seems wrong to me.
The Cluster and the Session instances were both properly managed by the 
try-with-resources statement. In the patch version only the Cluster instance is 
closed.

https://github.com/datastax/java-driver/blob/2.1/driver-core/src/main/java/com/datastax/driver/core/Cluster.java
Cluster doesn't implements AutoClosable interface, so it has to be closes 
manually. Closing cluster will close all session objects associated with it. 

Making the Cluster an instance variable in NativeRingCache will trigger an 
error when the write method will be called, as the Cluster has been closed at 
the end of the constructor.
=
NativeRingCache uses the cluster to get the metadata in the constructor when 
cluster object is still open. once the initialization of constructor is done, 
the cluster object is not used by NativeRingCache anymore.

In CqlInputFormat could you use try-with-resources for both Cluster and Session 
instances. I think it is best to do things properly by closing both of them.

Similarly closing cluster object auto-close all sessions objects.

> Cluster/session should be closed in Cassandra Hadoop Input/Output classes
> -
>
> Key: CASSANDRA-10837
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10837
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
>Reporter: Alex Liu
>Assignee: Alex Liu
> Fix For: 3.0.x
>
> Attachments: 10837-3.0-branch.txt, 10837-v2-3.0-branch.txt
>
>
> See a lot of following warnings during Hadoop job running
> {code}
> ERROR 11:37:45 LEAK: You are creating too many HashedWheelTimer instances.  
> HashedWheelTimer is a shared resource that must be reused across the JVM,so 
> that only a few instances are created.
> {code}
> Each cluster/session needs be closed and a shared HashedWheelTimer may reduce 
> the resource leakage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-10837) Cluster/session should be closed in Cassandra Hadoop Input/Output classes

2015-12-15 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-10837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058870#comment-15058870
 ] 

Alex Liu commented on CASSANDRA-10837:
--

The changes to CqlRecordWriter seems wrong to me.
The Cluster and the Session instances were both properly managed by the 
try-with-resources statement. In the patch version only the Cluster instance is 
closed.

https://github.com/datastax/java-driver/blob/2.1/driver-core/src/main/java/com/datastax/driver/core/Cluster.java
Cluster doesn't implements AutoClosable interface, so it has to be closes 
manually. Closing cluster will close all session objects associated with it. 

Making the Cluster an instance variable in NativeRingCache will trigger an 
error when the write method will be called, as the Cluster has been closed at 
the end of the constructor.
=
NativeRingCache uses the cluster to get the metadata in the constructor when 
cluster object is still open. once the initialization of constructor is done, 
the cluster object is not used by NativeRingCache anymore.

In CqlInputFormat could you use try-with-resources for both Cluster and Session 
instances. I think it is best to do things properly by closing both of them.

Similarly closing cluster object auto-close all sessions objects.

> Cluster/session should be closed in Cassandra Hadoop Input/Output classes
> -
>
> Key: CASSANDRA-10837
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10837
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
>Reporter: Alex Liu
>Assignee: Alex Liu
> Fix For: 3.0.x
>
> Attachments: 10837-3.0-branch.txt, 10837-v2-3.0-branch.txt
>
>
> See a lot of following warnings during Hadoop job running
> {code}
> ERROR 11:37:45 LEAK: You are creating too many HashedWheelTimer instances.  
> HashedWheelTimer is a shared resource that must be reused across the JVM,so 
> that only a few instances are created.
> {code}
> Each cluster/session needs be closed and a shared HashedWheelTimer may reduce 
> the resource leakage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-10837) Cluster/session should be closed in Cassandra Hadoop Input/Output classes

2015-12-15 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-10837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059195#comment-15059195
 ] 

Alex Liu commented on CASSANDRA-10837:
--

Attach v3 path to use close-with-resources and pass cluster to NativeRingCache 
to minimize creating new cluster objects

> Cluster/session should be closed in Cassandra Hadoop Input/Output classes
> -
>
> Key: CASSANDRA-10837
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10837
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
>Reporter: Alex Liu
>Assignee: Alex Liu
> Fix For: 3.0.x
>
> Attachments: 10837-3.0-branch.txt, 10837-v2-3.0-branch.txt, 
> 10837-v3-3.0-branch.txt
>
>
> See a lot of following warnings during Hadoop job running
> {code}
> ERROR 11:37:45 LEAK: You are creating too many HashedWheelTimer instances.  
> HashedWheelTimer is a shared resource that must be reused across the JVM,so 
> that only a few instances are created.
> {code}
> Each cluster/session needs be closed and a shared HashedWheelTimer may reduce 
> the resource leakage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-10837) Cluster/session should be closed in Cassandra Hadoop Input/Output classes

2015-12-15 Thread Alex Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Liu updated CASSANDRA-10837:
-
Attachment: 10837-v3-3.0-branch.txt

> Cluster/session should be closed in Cassandra Hadoop Input/Output classes
> -
>
> Key: CASSANDRA-10837
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10837
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
>Reporter: Alex Liu
>Assignee: Alex Liu
> Fix For: 3.0.x
>
> Attachments: 10837-3.0-branch.txt, 10837-v2-3.0-branch.txt, 
> 10837-v3-3.0-branch.txt
>
>
> See a lot of following warnings during Hadoop job running
> {code}
> ERROR 11:37:45 LEAK: You are creating too many HashedWheelTimer instances.  
> HashedWheelTimer is a shared resource that must be reused across the JVM,so 
> that only a few instances are created.
> {code}
> Each cluster/session needs be closed and a shared HashedWheelTimer may reduce 
> the resource leakage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-10837) Cluster/session should be closed in Cassandra Hadoop Input/Output classes

2015-12-15 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-10837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059162#comment-15059162
 ] 

Alex Liu commented on CASSANDRA-10837:
--

Summary the change for next patch.

1. keep use close-with-resources for cluster and session (Don't need manually 
close them)
2. pass cluster to NativeRingCache (Don't create too many new cluster objects)

> Cluster/session should be closed in Cassandra Hadoop Input/Output classes
> -
>
> Key: CASSANDRA-10837
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10837
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
>Reporter: Alex Liu
>Assignee: Alex Liu
> Fix For: 3.0.x
>
> Attachments: 10837-3.0-branch.txt, 10837-v2-3.0-branch.txt
>
>
> See a lot of following warnings during Hadoop job running
> {code}
> ERROR 11:37:45 LEAK: You are creating too many HashedWheelTimer instances.  
> HashedWheelTimer is a shared resource that must be reused across the JVM,so 
> that only a few instances are created.
> {code}
> Each cluster/session needs be closed and a shared HashedWheelTimer may reduce 
> the resource leakage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-10837) Cluster/session should be closed in Cassandra Hadoop Input/Output classes

2015-12-10 Thread Alex Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Liu updated CASSANDRA-10837:
-
Attachment: 10837-3.0-branch.txt

> Cluster/session should be closed in Cassandra Hadoop Input/Output classes
> -
>
> Key: CASSANDRA-10837
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10837
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Alex Liu
>Assignee: Alex Liu
> Attachments: 10837-3.0-branch.txt
>
>
> See a lot of following warnings during Hadoop job running
> {code}
> ERROR 11:37:45 LEAK: You are creating too many HashedWheelTimer instances.  
> HashedWheelTimer is a shared resource that must be reused across the JVM,so 
> that only a few instances are created.
> {code}
> Each cluster/session needs be closed and a shared HashedWheelTimer may reduce 
> the resource leakage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-10837) Cluster/session should be closed in Cassandra Hadoop Input/Output classes

2015-12-10 Thread Alex Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Liu updated CASSANDRA-10837:
-
Attachment: (was: 10837-3.0-branch.txt)

> Cluster/session should be closed in Cassandra Hadoop Input/Output classes
> -
>
> Key: CASSANDRA-10837
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10837
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Alex Liu
>Assignee: Alex Liu
>
> See a lot of following warnings during Hadoop job running
> {code}
> ERROR 11:37:45 LEAK: You are creating too many HashedWheelTimer instances.  
> HashedWheelTimer is a shared resource that must be reused across the JVM,so 
> that only a few instances are created.
> {code}
> Each cluster/session needs be closed and a shared HashedWheelTimer may reduce 
> the resource leakage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-10837) Cluster/session should be closed in Cassandra Hadoop Input/Output classes

2015-12-10 Thread Alex Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Liu updated CASSANDRA-10837:
-
Attachment: 10837-v2-3.0-branch.txt

> Cluster/session should be closed in Cassandra Hadoop Input/Output classes
> -
>
> Key: CASSANDRA-10837
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10837
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Alex Liu
>Assignee: Alex Liu
> Attachments: 10837-3.0-branch.txt, 10837-v2-3.0-branch.txt
>
>
> See a lot of following warnings during Hadoop job running
> {code}
> ERROR 11:37:45 LEAK: You are creating too many HashedWheelTimer instances.  
> HashedWheelTimer is a shared resource that must be reused across the JVM,so 
> that only a few instances are created.
> {code}
> Each cluster/session needs be closed and a shared HashedWheelTimer may reduce 
> the resource leakage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-10837) Cluster/session should be closed in Cassandra Hadoop Input/Output classes

2015-12-09 Thread Alex Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Liu updated CASSANDRA-10837:
-
Attachment: 10837-3.0-branch.txt

> Cluster/session should be closed in Cassandra Hadoop Input/Output classes
> -
>
> Key: CASSANDRA-10837
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10837
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Alex Liu
>Assignee: Alex Liu
> Attachments: 10837-3.0-branch.txt
>
>
> See a lot of following warnings during Hadoop job running
> {code}
> ERROR 11:37:45 LEAK: You are creating too many HashedWheelTimer instances.  
> HashedWheelTimer is a shared resource that must be reused across the JVM,so 
> that only a few instances are created.
> {code}
> Each cluster/session needs be closed and a shared HashedWheelTimer may reduce 
> the resource leakage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CASSANDRA-10837) Cluster/session should be closed in Cassandra Hadoop Input/Output classes

2015-12-09 Thread Alex Liu (JIRA)

Alex Liu created CASSANDRA-10837:


 Summary: Cluster/session should be closed in Cassandra Hadoop 
Input/Output classes
 Key: CASSANDRA-10837
 URL: https://issues.apache.org/jira/browse/CASSANDRA-10837
 Project: Cassandra
  Issue Type: Bug
Reporter: Alex Liu
Assignee: Alex Liu


See a lot of following warnings during Hadoop job running

{code}
ERROR 11:37:45 LEAK: You are creating too many HashedWheelTimer instances.  
HashedWheelTimer is a shared resource that must be reused across the JVM,so 
that only a few instances are created.

{code}

Each cluster/session needs be closed and a shared HashedWheelTimer may reduce 
the resource leakage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (CASSANDRA-10806) sstableloader can't handle upper case keyspace

2015-12-02 Thread Alex Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Liu reassigned CASSANDRA-10806:


Assignee: Alex Liu

> sstableloader can't handle upper case keyspace
> --
>
> Key: CASSANDRA-10806
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10806
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: Alex Liu
>Assignee: Alex Liu
>Priority: Minor
> Attachments: CASSANDRA-10806-3.0-branch.txt
>
>
> sstableloader can't handle upper case keyspace. The following shows the 
> endpoint is missing
> {code}
> cassandra/bin/sstableloader 
> /var/folders/zz/zyxvpxvq6csfxvn_n0/T/bulk-write-to-Test1-Words-a9343a5f-62f3-4901-a9c8-ab7dc42a458e/Test1/Words-5
>   -d 127.0.0.1
> objc[7818]: Class JavaLaunchHelper is implemented in both 
> /Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/bin/java and 
> /Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/jre/lib/libinstrument.dylib.
>  One of the two will be used. Which one is undefined.
> Established connection to initial hosts
> Opening sstables and calculating sections to stream
> Streaming relevant part of 
> /var/folders/zz/zyxvpxvq6csfxvn_n0/T/bulk-write-to-Test1-Words-a9343a5f-62f3-4901-a9c8-ab7dc42a458e/Test1/Words-5/ma-1-big-Data.db
>  to []
> Summary statistics: 
>   Connections per host:: 1
>   Total files transferred:  : 0
>   Total bytes transferred:  : 0
>   Total duration (ms):  : 923  
>   Average transfer rate (MB/s): : 0
>   Peak transfer rate (MB/s):: 0 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-10806) sstableloader can't handle upper case keyspace

2015-12-02 Thread Alex Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Liu updated CASSANDRA-10806:
-
Attachment: CASSANDRA-10806-3.0-branch.txt

> sstableloader can't handle upper case keyspace
> --
>
> Key: CASSANDRA-10806
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10806
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: Alex Liu
>Priority: Minor
> Attachments: CASSANDRA-10806-3.0-branch.txt
>
>
> sstableloader can't handle upper case keyspace. The following shows the 
> endpoint is missing
> {code}
> cassandra/bin/sstableloader 
> /var/folders/zz/zyxvpxvq6csfxvn_n0/T/bulk-write-to-Test1-Words-a9343a5f-62f3-4901-a9c8-ab7dc42a458e/Test1/Words-5
>   -d 127.0.0.1
> objc[7818]: Class JavaLaunchHelper is implemented in both 
> /Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/bin/java and 
> /Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/jre/lib/libinstrument.dylib.
>  One of the two will be used. Which one is undefined.
> Established connection to initial hosts
> Opening sstables and calculating sections to stream
> Streaming relevant part of 
> /var/folders/zz/zyxvpxvq6csfxvn_n0/T/bulk-write-to-Test1-Words-a9343a5f-62f3-4901-a9c8-ab7dc42a458e/Test1/Words-5/ma-1-big-Data.db
>  to []
> Summary statistics: 
>   Connections per host:: 1
>   Total files transferred:  : 0
>   Total bytes transferred:  : 0
>   Total duration (ms):  : 923  
>   Average transfer rate (MB/s): : 0
>   Peak transfer rate (MB/s):: 0 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-10806) sstableloader can't handle upper case keyspace

2015-12-02 Thread Alex Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Liu updated CASSANDRA-10806:
-
Summary: sstableloader can't handle upper case keyspace  (was: 
sstableloader can)

> sstableloader can't handle upper case keyspace
> --
>
> Key: CASSANDRA-10806
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10806
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Alex Liu
>Priority: Minor
>
> sstableloader can't handle upper case keyspace. The following shows the 
> endpoint is missing
> {code}
> cassandra/bin/sstableloader 
> /var/folders/zz/zyxvpxvq6csfxvn_n0/T/bulk-write-to-Test1-Words-a9343a5f-62f3-4901-a9c8-ab7dc42a458e/Test1/Words-5
>   -d 127.0.0.1
> objc[7818]: Class JavaLaunchHelper is implemented in both 
> /Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/bin/java and 
> /Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/jre/lib/libinstrument.dylib.
>  One of the two will be used. Which one is undefined.
> Established connection to initial hosts
> Opening sstables and calculating sections to stream
> Streaming relevant part of 
> /var/folders/zz/zyxvpxvq6csfxvn_n0/T/bulk-write-to-Test1-Words-a9343a5f-62f3-4901-a9c8-ab7dc42a458e/Test1/Words-5/ma-1-big-Data.db
>  to []
> Summary statistics: 
>   Connections per host:: 1
>   Total files transferred:  : 0
>   Total bytes transferred:  : 0
>   Total duration (ms):  : 923  
>   Average transfer rate (MB/s): : 0
>   Peak transfer rate (MB/s):: 0 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-10806) sstableloader can't handle upper case keyspace

2015-12-02 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-10806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036549#comment-15036549
 ] 

Alex Liu commented on CASSANDRA-10806:
--

keyspace should be quoted at
https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/utils/NativeSSTableLoaderClient.java#L78

> sstableloader can't handle upper case keyspace
> --
>
> Key: CASSANDRA-10806
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10806
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Alex Liu
>Priority: Minor
>
> sstableloader can't handle upper case keyspace. The following shows the 
> endpoint is missing
> {code}
> cassandra/bin/sstableloader 
> /var/folders/zz/zyxvpxvq6csfxvn_n0/T/bulk-write-to-Test1-Words-a9343a5f-62f3-4901-a9c8-ab7dc42a458e/Test1/Words-5
>   -d 127.0.0.1
> objc[7818]: Class JavaLaunchHelper is implemented in both 
> /Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/bin/java and 
> /Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/jre/lib/libinstrument.dylib.
>  One of the two will be used. Which one is undefined.
> Established connection to initial hosts
> Opening sstables and calculating sections to stream
> Streaming relevant part of 
> /var/folders/zz/zyxvpxvq6csfxvn_n0/T/bulk-write-to-Test1-Words-a9343a5f-62f3-4901-a9c8-ab7dc42a458e/Test1/Words-5/ma-1-big-Data.db
>  to []
> Summary statistics: 
>   Connections per host:: 1
>   Total files transferred:  : 0
>   Total bytes transferred:  : 0
>   Total duration (ms):  : 923  
>   Average transfer rate (MB/s): : 0
>   Peak transfer rate (MB/s):: 0 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-10806) sstableloader can't handle upper case keyspace

2015-12-02 Thread Alex Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Liu updated CASSANDRA-10806:
-
Component/s: Tools

> sstableloader can't handle upper case keyspace
> --
>
> Key: CASSANDRA-10806
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10806
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: Alex Liu
>Priority: Minor
>
> sstableloader can't handle upper case keyspace. The following shows the 
> endpoint is missing
> {code}
> cassandra/bin/sstableloader 
> /var/folders/zz/zyxvpxvq6csfxvn_n0/T/bulk-write-to-Test1-Words-a9343a5f-62f3-4901-a9c8-ab7dc42a458e/Test1/Words-5
>   -d 127.0.0.1
> objc[7818]: Class JavaLaunchHelper is implemented in both 
> /Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/bin/java and 
> /Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/jre/lib/libinstrument.dylib.
>  One of the two will be used. Which one is undefined.
> Established connection to initial hosts
> Opening sstables and calculating sections to stream
> Streaming relevant part of 
> /var/folders/zz/zyxvpxvq6csfxvn_n0/T/bulk-write-to-Test1-Words-a9343a5f-62f3-4901-a9c8-ab7dc42a458e/Test1/Words-5/ma-1-big-Data.db
>  to []
> Summary statistics: 
>   Connections per host:: 1
>   Total files transferred:  : 0
>   Total bytes transferred:  : 0
>   Total duration (ms):  : 923  
>   Average transfer rate (MB/s): : 0
>   Peak transfer rate (MB/s):: 0 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CASSANDRA-10806) sstableloader can

2015-12-02 Thread Alex Liu (JIRA)

Alex Liu created CASSANDRA-10806:


 Summary: sstableloader can
 Key: CASSANDRA-10806
 URL: https://issues.apache.org/jira/browse/CASSANDRA-10806
 Project: Cassandra
  Issue Type: Bug
Reporter: Alex Liu
Priority: Minor


sstableloader can't handle upper case keyspace. The following shows the 
endpoint is missing

{code}
cassandra/bin/sstableloader 
/var/folders/zz/zyxvpxvq6csfxvn_n0/T/bulk-write-to-Test1-Words-a9343a5f-62f3-4901-a9c8-ab7dc42a458e/Test1/Words-5
  -d 127.0.0.1
objc[7818]: Class JavaLaunchHelper is implemented in both 
/Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/bin/java and 
/Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/jre/lib/libinstrument.dylib.
 One of the two will be used. Which one is undefined.
Established connection to initial hosts
Opening sstables and calculating sections to stream
Streaming relevant part of 
/var/folders/zz/zyxvpxvq6csfxvn_n0/T/bulk-write-to-Test1-Words-a9343a5f-62f3-4901-a9c8-ab7dc42a458e/Test1/Words-5/ma-1-big-Data.db
 to []

Summary statistics: 
  Connections per host:: 1
  Total files transferred:  : 0
  Total bytes transferred:  : 0
  Total duration (ms):  : 923  
  Average transfer rate (MB/s): : 0
  Peak transfer rate (MB/s):: 0 

{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-10751) "Pool is shutdown" error when running Hadoop jobs on Yarn

2015-11-24 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-10751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15025825#comment-15025825
 ] 

Alex Liu commented on CASSANDRA-10751:
--

Cool, can you check latest C* 2.2.* and C*3.x. If possible, submit a patch.

> "Pool is shutdown" error when running Hadoop jobs on Yarn
> -
>
> Key: CASSANDRA-10751
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10751
> Project: Cassandra
>  Issue Type: Bug
> Environment: Hadoop 2.7.1 (HDP 2.3.2)
> Cassandra 2.1.11
>Reporter: Cyril Scetbon
>Assignee: Alex Liu
> Attachments: output.log
>
>
> Trying to execute an Hadoop job on Yarn, I get errors from Cassandra's 
> internal code. It seems that connections are shutdown but we can't understand 
> why ...
> Here is a subtract of the errors. I also add a file with the complete debug 
> logs.
> {code}
> 15/11/22 20:05:54 [main]: DEBUG core.RequestHandler: Error querying 
> node006.internal.net/192.168.12.22:9042, trying next host (error is: 
> com.datastax.driver.core.ConnectionException: 
> [node006.internal.net/192.168.12.22:9042] Pool is shutdown)
> Failed with exception java.io.IOException:java.io.IOException: 
> com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) 
> tried for query failed (tried: node006.internal.net/192.168.12.22:9042 
> (com.datastax.driver.core.ConnectionException: 
> [node006.internal.net/192.168.12.22:9042] Pool is shutdown))
> 15/11/22 20:05:54 [main]: ERROR CliDriver: Failed with exception 
> java.io.IOException:java.io.IOException: 
> com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) 
> tried for query failed (tried: node006.internal.net/192.168.12.22:9042 
> (com.datastax.driver.core.ConnectionException: 
> [node006.internal.net/192.168.12.22:9042] Pool is shutdown))
> java.io.IOException: java.io.IOException: 
> com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) 
> tried for query failed (tried: node006.internal.net/192.168.12.22:9042 
> (com.datastax.driver.core.ConnectionException: 
> [node006.internal.net/192.168.12.22:9042] Pool is shutdown))
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:508)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:415)
>   at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:140)
>   at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1672)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> Caused by: java.io.IOException: 
> com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) 
> tried for query failed (tried: node006.internal.net/192.168.12.22:9042 
> (com.datastax.driver.core.ConnectionException: 
> [node006.internal.net/192.168.12.22:9042] Pool is shutdown))
>   at 
> org.apache.hadoop.hive.cassandra.input.cql.HiveCqlInputFormat.getRecordReader(HiveCqlInputFormat.java:132)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:674)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:324)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:446)
>   ... 15 more
> Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All 
> host(s) tried for query failed (tried: 
> node006.internal.net/192.168.12.22:9042 
> (com.datastax.driver.core.ConnectionException: 
> [node006.internal.net/192.168.12.22:9042] Pool is shutdown))
>   at 
> com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:84)
>   at 
> com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37)
>   at 
> com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:214)
>   at 
>

[jira] [Commented] (CASSANDRA-10751) "Pool is shutdown" error when running Hadoop jobs on Yarn

2015-11-24 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-10751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15024873#comment-15024873
 ] 

Alex Liu commented on CASSANDRA-10751:
--

Can you get the full stacktrace by changing the log level of Host.STATES to 
TRACE?

> "Pool is shutdown" error when running Hadoop jobs on Yarn
> -
>
> Key: CASSANDRA-10751
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10751
> Project: Cassandra
>  Issue Type: Bug
> Environment: Hadoop 2.7.1 (HDP 2.3.2)
> Cassandra 2.1.11
>Reporter: Cyril Scetbon
>Assignee: Alex Liu
> Attachments: output.log
>
>
> Trying to execute an Hadoop job on Yarn, I get errors from Cassandra's 
> internal code. It seems that connections are shutdown but we can't understand 
> why ...
> Here is a subtract of the errors. I also add a file with the complete debug 
> logs.
> {code}
> 15/11/22 20:05:54 [main]: DEBUG core.RequestHandler: Error querying 
> node006.internal.net/192.168.12.22:9042, trying next host (error is: 
> com.datastax.driver.core.ConnectionException: 
> [node006.internal.net/192.168.12.22:9042] Pool is shutdown)
> Failed with exception java.io.IOException:java.io.IOException: 
> com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) 
> tried for query failed (tried: node006.internal.net/192.168.12.22:9042 
> (com.datastax.driver.core.ConnectionException: 
> [node006.internal.net/192.168.12.22:9042] Pool is shutdown))
> 15/11/22 20:05:54 [main]: ERROR CliDriver: Failed with exception 
> java.io.IOException:java.io.IOException: 
> com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) 
> tried for query failed (tried: node006.internal.net/192.168.12.22:9042 
> (com.datastax.driver.core.ConnectionException: 
> [node006.internal.net/192.168.12.22:9042] Pool is shutdown))
> java.io.IOException: java.io.IOException: 
> com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) 
> tried for query failed (tried: node006.internal.net/192.168.12.22:9042 
> (com.datastax.driver.core.ConnectionException: 
> [node006.internal.net/192.168.12.22:9042] Pool is shutdown))
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:508)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:415)
>   at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:140)
>   at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1672)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> Caused by: java.io.IOException: 
> com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) 
> tried for query failed (tried: node006.internal.net/192.168.12.22:9042 
> (com.datastax.driver.core.ConnectionException: 
> [node006.internal.net/192.168.12.22:9042] Pool is shutdown))
>   at 
> org.apache.hadoop.hive.cassandra.input.cql.HiveCqlInputFormat.getRecordReader(HiveCqlInputFormat.java:132)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:674)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:324)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:446)
>   ... 15 more
> Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All 
> host(s) tried for query failed (tried: 
> node006.internal.net/192.168.12.22:9042 
> (com.datastax.driver.core.ConnectionException: 
> [node006.internal.net/192.168.12.22:9042] Pool is shutdown))
>   at 
> com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:84)
>   at 
> com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37)
>   at 
> com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:214)
>   at 
>

[jira] [Commented] (CASSANDRA-10751) "Pool is shutdown" error when running Hadoop jobs on Yarn

2015-11-23 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-10751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023480#comment-15023480
 ] 

Alex Liu commented on CASSANDRA-10751:
--

The error indicates that C* node can't handle the load(Maybe there are too many 
splits). Do you try the latest C* 2.x or C*3.x?

> "Pool is shutdown" error when running Hadoop jobs on Yarn
> -
>
> Key: CASSANDRA-10751
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10751
> Project: Cassandra
>  Issue Type: Bug
> Environment: Hadoop 2.7.1 (HDP 2.3.2)
> Cassandra 2.1.11
>Reporter: Cyril Scetbon
>Assignee: Alex Liu
> Attachments: output.log
>
>
> Trying to execute an Hadoop job on Yarn, I get errors from Cassandra's 
> internal code. It seems that connections are shutdown but we can't understand 
> why ...
> Here is a subtract of the errors. I also add a file with the complete debug 
> logs.
> {code}
> 15/11/22 20:05:54 [main]: DEBUG core.RequestHandler: Error querying 
> node006.internal.net/192.168.12.22:9042, trying next host (error is: 
> com.datastax.driver.core.ConnectionException: 
> [node006.internal.net/192.168.12.22:9042] Pool is shutdown)
> Failed with exception java.io.IOException:java.io.IOException: 
> com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) 
> tried for query failed (tried: node006.internal.net/192.168.12.22:9042 
> (com.datastax.driver.core.ConnectionException: 
> [node006.internal.net/192.168.12.22:9042] Pool is shutdown))
> 15/11/22 20:05:54 [main]: ERROR CliDriver: Failed with exception 
> java.io.IOException:java.io.IOException: 
> com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) 
> tried for query failed (tried: node006.internal.net/192.168.12.22:9042 
> (com.datastax.driver.core.ConnectionException: 
> [node006.internal.net/192.168.12.22:9042] Pool is shutdown))
> java.io.IOException: java.io.IOException: 
> com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) 
> tried for query failed (tried: node006.internal.net/192.168.12.22:9042 
> (com.datastax.driver.core.ConnectionException: 
> [node006.internal.net/192.168.12.22:9042] Pool is shutdown))
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:508)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:415)
>   at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:140)
>   at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1672)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> Caused by: java.io.IOException: 
> com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) 
> tried for query failed (tried: node006.internal.net/192.168.12.22:9042 
> (com.datastax.driver.core.ConnectionException: 
> [node006.internal.net/192.168.12.22:9042] Pool is shutdown))
>   at 
> org.apache.hadoop.hive.cassandra.input.cql.HiveCqlInputFormat.getRecordReader(HiveCqlInputFormat.java:132)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:674)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:324)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:446)
>   ... 15 more
> Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All 
> host(s) tried for query failed (tried: 
> node006.internal.net/192.168.12.22:9042 
> (com.datastax.driver.core.ConnectionException: 
> [node006.internal.net/192.168.12.22:9042] Pool is shutdown))
>   at 
> com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:84)
>   at 
> com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37)
>   at 
> com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:214)
>

[jira] [Commented] (CASSANDRA-10640) hadoop splits are calculated wrong

2015-11-10 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-10640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14998855#comment-14998855
 ] 

Alex Liu commented on CASSANDRA-10640:
--

+1

> hadoop splits are calculated wrong
> --
>
> Key: CASSANDRA-10640
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10640
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Alex Liu
>Assignee: Aleksey Yeschenko
> Fix For: 2.2.x
>
> Attachments: 10640.txt
>
>
> A typo at line 
> https://github.com/apache/cassandra/blob/cassandra-2.2/src/java/org/apache/cassandra/hadoop/AbstractColumnFamilyInputFormat.java#L216
> where getEnd should be used



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-10648) Native protocol is dead after running some Hive queries

2015-11-05 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-10648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14992236#comment-14992236
 ] 

Alex Liu commented on CASSANDRA-10648:
--

C* 2.1.x works fine. I shut down all other programs and C* node native is dead 
forever without any client running on C*.

> Native protocol is dead after running some Hive queries
> ---
>
> Key: CASSANDRA-10648
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10648
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Alex Liu
> Fix For: 2.2.x
>
> Attachments: 10_day_loss.q, deadlock.txt, deadlock1.txt
>
>
> When test on DSE portfolio demo, which basically creates a few C* tables and 
> inserts some data into the tables, then run some Hive queries on the tables.
> I attach the Hive queries
> After some queries are done, C* node is dead on native port, cqlsh can't 
> login any more.
> Some thread dumps are attached. Too many threads are in waiting mode and 
> system is not responding.
> It's tested on C* 2.2.3 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CASSANDRA-10648) Native protocol is dead after running some Hive queries

2015-11-03 Thread Alex Liu (JIRA)

Alex Liu created CASSANDRA-10648:


 Summary: Native protocol is dead after running some Hive queries
 Key: CASSANDRA-10648
 URL: https://issues.apache.org/jira/browse/CASSANDRA-10648
 Project: Cassandra
  Issue Type: Bug
Reporter: Alex Liu
 Fix For: 2.2.x


When test on DSE portfolio demo, which basically creates a few C* tables and 
inserts some data into the tables, then run some Hive queries on the tables.

I attach the Hive queries

After some queries are done, C* node is dead on native port, cqlsh can't login 
any more.

Some thread dumps are attached. Too many threads are in waiting mode and system 
is not responding.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-10648) Native protocol is dead after running some Hive queries

2015-11-03 Thread Alex Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Liu updated CASSANDRA-10648:
-
Attachment: deadlock1.txt
deadlock.txt

> Native protocol is dead after running some Hive queries
> ---
>
> Key: CASSANDRA-10648
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10648
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Alex Liu
> Fix For: 2.2.x
>
> Attachments: deadlock.txt, deadlock1.txt
>
>
> When test on DSE portfolio demo, which basically creates a few C* tables and 
> inserts some data into the tables, then run some Hive queries on the tables.
> I attach the Hive queries
> After some queries are done, C* node is dead on native port, cqlsh can't 
> login any more.
> Some thread dumps are attached. Too many threads are in waiting mode and 
> system is not responding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-10648) Native protocol is dead after running some Hive queries

2015-11-03 Thread Alex Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Liu updated CASSANDRA-10648:
-
Attachment: 10_day_loss.q

> Native protocol is dead after running some Hive queries
> ---
>
> Key: CASSANDRA-10648
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10648
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Alex Liu
> Fix For: 2.2.x
>
> Attachments: 10_day_loss.q, deadlock.txt, deadlock1.txt
>
>
> When test on DSE portfolio demo, which basically creates a few C* tables and 
> inserts some data into the tables, then run some Hive queries on the tables.
> I attach the Hive queries
> After some queries are done, C* node is dead on native port, cqlsh can't 
> login any more.
> Some thread dumps are attached. Too many threads are in waiting mode and 
> system is not responding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-10648) Native protocol is dead after running some Hive queries

2015-11-03 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-10648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14988286#comment-14988286
 ] 

Alex Liu commented on CASSANDRA-10648:
--

set hive.auto.convert.join=false; in Hive to disable MapJoin 

> Native protocol is dead after running some Hive queries
> ---
>
> Key: CASSANDRA-10648
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10648
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Alex Liu
> Fix For: 2.2.x
>
> Attachments: 10_day_loss.q, deadlock.txt, deadlock1.txt
>
>
> When test on DSE portfolio demo, which basically creates a few C* tables and 
> inserts some data into the tables, then run some Hive queries on the tables.
> I attach the Hive queries
> After some queries are done, C* node is dead on native port, cqlsh can't 
> login any more.
> Some thread dumps are attached. Too many threads are in waiting mode and 
> system is not responding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-10648) Native protocol is dead after running some Hive queries

2015-11-03 Thread Alex Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Liu updated CASSANDRA-10648:
-
Description: 
When test on DSE portfolio demo, which basically creates a few C* tables and 
inserts some data into the tables, then run some Hive queries on the tables.

I attach the Hive queries

After some queries are done, C* node is dead on native port, cqlsh can't login 
any more.

Some thread dumps are attached. Too many threads are in waiting mode and system 
is not responding.


It's tested on C* 2.2.3 

  was:
When test on DSE portfolio demo, which basically creates a few C* tables and 
inserts some data into the tables, then run some Hive queries on the tables.

I attach the Hive queries

After some queries are done, C* node is dead on native port, cqlsh can't login 
any more.

Some thread dumps are attached. Too many threads are in waiting mode and system 
is not responding.



> Native protocol is dead after running some Hive queries
> ---
>
> Key: CASSANDRA-10648
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10648
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Alex Liu
> Fix For: 2.2.x
>
> Attachments: 10_day_loss.q, deadlock.txt, deadlock1.txt
>
>
> When test on DSE portfolio demo, which basically creates a few C* tables and 
> inserts some data into the tables, then run some Hive queries on the tables.
> I attach the Hive queries
> After some queries are done, C* node is dead on native port, cqlsh can't 
> login any more.
> Some thread dumps are attached. Too many threads are in waiting mode and 
> system is not responding.
> It's tested on C* 2.2.3 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CASSANDRA-10640) hadoop splits are calculated wrong

2015-11-03 Thread Alex Liu (JIRA)

Alex Liu created CASSANDRA-10640:


 Summary: hadoop splits are calculated wrong
 Key: CASSANDRA-10640
 URL: https://issues.apache.org/jira/browse/CASSANDRA-10640
 Project: Cassandra
  Issue Type: Bug
Reporter: Alex Liu
 Fix For: 2.2.x


A typo at line 
https://github.com/apache/cassandra/blob/cassandra-2.2/src/java/org/apache/cassandra/hadoop/AbstractColumnFamilyInputFormat.java#L216

where getEnd should be used



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-7410) Pig support for BulkOutputFormat as a parameter in url

2015-09-03 Thread Alex Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Liu updated CASSANDRA-7410:

Attachment: CASSANDRA-7410-v1-2.2.txt

Attached the patch on cassandra-2.2 branch. It also fixed some issue of setting 
partitioner and zero split count.

The NativeSSTableLoaderClient has a bug to load ByteOrderPartitioner sstables. 
It works fine with Murmur3Partitioner.

> Pig support for BulkOutputFormat as a parameter in url
> --
>
> Key: CASSANDRA-7410
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7410
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Hadoop
>Reporter: Alex Liu
>Assignee: Alex Liu
>Priority: Minor
> Fix For: 2.1.x, 2.2.x
>
> Attachments: 7410-2.0-branch.txt, 7410-2.1-branch.txt, 
> 7410-v2-2.0-branch.txt, 7410-v3-2.0-branch.txt, CASSANDRA-7410-v1-2.2.txt, 
> CASSANDRA-7410-v2-2.1-branch.txt, CASSANDRA-7410-v3-2.1-branch.txt, 
> CASSANDRA-7410-v4-2.0-branch.txt, CASSANDRA-7410-v4-2.1-branch.txt, 
> CASSANDRA-7410-v5-2.0-branch.txt
>
>
> Add BulkOutputFormat support in Pig url



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CASSANDRA-10058) Close Java driver Client object in Hadoop and Pig classes

2015-08-12 Thread Alex Liu (JIRA)

Alex Liu created CASSANDRA-10058:


 Summary: Close Java driver Client object in Hadoop and Pig classes
 Key: CASSANDRA-10058
 URL: https://issues.apache.org/jira/browse/CASSANDRA-10058
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Reporter: Alex Liu
Assignee: Alex Liu


I found that some Hadoop and Pig code in Cassandra doesn't close the Client 
object, that's the cause for the following errors in java driver 2.2.0-rc1.

{code}
ERROR 11:37:45 LEAK: You are creating too many HashedWheelTimer instances.  
HashedWheelTimer is a shared resource that must be reused across the JVM,so 
that only a few instances are created.
{code}

We should close the Client objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-10058) Close Java driver Client object in Hadoop and Pig classes

2015-08-12 Thread Alex Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Liu updated CASSANDRA-10058:
-
Attachment: CASSANDRA-10058-2.2.txt

 Close Java driver Client object in Hadoop and Pig classes
 -

 Key: CASSANDRA-10058
 URL: https://issues.apache.org/jira/browse/CASSANDRA-10058
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Reporter: Alex Liu
Assignee: Alex Liu
 Fix For: 2.2.x

 Attachments: CASSANDRA-10058-2.2.txt


 I found that some Hadoop and Pig code in Cassandra doesn't close the Client 
 object, that's the cause for the following errors in java driver 2.2.0-rc1.
 {code}
 ERROR 11:37:45 LEAK: You are creating too many HashedWheelTimer instances.  
 HashedWheelTimer is a shared resource that must be reused across the JVM,so 
 that only a few instances are created.
 {code}
 We should close the Client objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

2015-07-09 Thread Alex Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Liu updated CASSANDRA-8576:

Attachment: CASSANDRA-8576-v1-2.2-branch.txt

Sorry guys, I missed the above comments. I attach the patch on cassandra-2.2 
branch

 Primary Key Pushdown For Hadoop
 ---

 Key: CASSANDRA-8576
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Reporter: Russell Alexander Spitzer
Assignee: Alex Liu
 Fix For: 2.2.x

 Attachments: 8576-2.1-branch.txt, 8576-trunk.txt, 
 CASSANDRA-8576-v1-2.2-branch.txt, CASSANDRA-8576-v2-2.1-branch.txt, 
 CASSANDRA-8576-v3-2.1-branch.txt


 I've heard reports from several users that they would like to have predicate 
 pushdown functionality for hadoop (Hive in particular) based services. 
 Example usecase
 Table with wide partitions, one per customer
 Application team has HQL they would like to run on a single customer
 Currently time to complete scales with number of customers since Input Format 
 can't pushdown primary key predicate
 Current implementation requires a full table scan (since it can't recognize 
 that a single partition was specified)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

2015-05-11 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538044#comment-14538044
 ] 

Alex Liu commented on CASSANDRA-8576:
-

It's no much different,but I will use your changes :)

 Primary Key Pushdown For Hadoop
 ---

 Key: CASSANDRA-8576
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Reporter: Russell Alexander Spitzer
Assignee: Alex Liu
 Fix For: 2.1.x

 Attachments: 8576-2.1-branch.txt, 8576-trunk.txt, 
 CASSANDRA-8576-v2-2.1-branch.txt


 I've heard reports from several users that they would like to have predicate 
 pushdown functionality for hadoop (Hive in particular) based services. 
 Example usecase
 Table with wide partitions, one per customer
 Application team has HQL they would like to run on a single customer
 Currently time to complete scales with number of customers since Input Format 
 can't pushdown primary key predicate
 Current implementation requires a full table scan (since it can't recognize 
 that a single partition was specified)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

2015-05-11 Thread Alex Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Liu updated CASSANDRA-8576:

Attachment: CASSANDRA-8576-v3-2.1-branch.txt

 Primary Key Pushdown For Hadoop
 ---

 Key: CASSANDRA-8576
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Reporter: Russell Alexander Spitzer
Assignee: Alex Liu
 Fix For: 2.1.x

 Attachments: 8576-2.1-branch.txt, 8576-trunk.txt, 
 CASSANDRA-8576-v2-2.1-branch.txt, CASSANDRA-8576-v3-2.1-branch.txt


 I've heard reports from several users that they would like to have predicate 
 pushdown functionality for hadoop (Hive in particular) based services. 
 Example usecase
 Table with wide partitions, one per customer
 Application team has HQL they would like to run on a single customer
 Currently time to complete scales with number of customers since Input Format 
 can't pushdown primary key predicate
 Current implementation requires a full table scan (since it can't recognize 
 that a single partition was specified)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

2015-04-29 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520084#comment-14520084
 ] 

Alex Liu commented on CASSANDRA-8576:
-

Which branch should this go into? Is it still going into 2.1.5 ? or other 
release?

 Primary Key Pushdown For Hadoop
 ---

 Key: CASSANDRA-8576
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Reporter: Russell Alexander Spitzer
Assignee: Alex Liu
 Fix For: 2.1.x

 Attachments: 8576-2.1-branch.txt, 8576-trunk.txt


 I've heard reports from several users that they would like to have predicate 
 pushdown functionality for hadoop (Hive in particular) based services. 
 Example usecase
 Table with wide partitions, one per customer
 Application team has HQL they would like to run on a single customer
 Currently time to complete scales with number of customers since Input Format 
 can't pushdown primary key predicate
 Current implementation requires a full table scan (since it can't recognize 
 that a single partition was specified)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

2015-04-29 Thread Alex Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Liu updated CASSANDRA-8576:

Attachment: CASSANDRA-8576-v2-2.1-branch.txt

v2 is attached which addresses the comments.

 Primary Key Pushdown For Hadoop
 ---

 Key: CASSANDRA-8576
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Reporter: Russell Alexander Spitzer
Assignee: Alex Liu
 Fix For: 2.1.x

 Attachments: 8576-2.1-branch.txt, 8576-trunk.txt, 
 CASSANDRA-8576-v2-2.1-branch.txt


 I've heard reports from several users that they would like to have predicate 
 pushdown functionality for hadoop (Hive in particular) based services. 
 Example usecase
 Table with wide partitions, one per customer
 Application team has HQL they would like to run on a single customer
 Currently time to complete scales with number of customers since Input Format 
 can't pushdown primary key predicate
 Current implementation requires a full table scan (since it can't recognize 
 that a single partition was specified)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

2015-04-29 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520395#comment-14520395
 ] 

Alex Liu commented on CASSANDRA-8576:
-

if token == null, containToken is false. All other comments will be addressed 
in the new patch

 Primary Key Pushdown For Hadoop
 ---

 Key: CASSANDRA-8576
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Reporter: Russell Alexander Spitzer
Assignee: Alex Liu
 Fix For: 2.1.x

 Attachments: 8576-2.1-branch.txt, 8576-trunk.txt


 I've heard reports from several users that they would like to have predicate 
 pushdown functionality for hadoop (Hive in particular) based services. 
 Example usecase
 Table with wide partitions, one per customer
 Application team has HQL they would like to run on a single customer
 Currently time to complete scales with number of customers since Input Format 
 can't pushdown primary key predicate
 Current implementation requires a full table scan (since it can't recognize 
 that a single partition was specified)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-7410) Pig support for BulkOutputFormat as a parameter in url

2015-04-21 Thread Alex Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Liu updated CASSANDRA-7410:

Attachment: CASSANDRA-7410-v5-2.0-branch.txt
CASSANDRA-7410-v4-2.1-branch.txt

Patches addressing comments are attached.

 Pig support for BulkOutputFormat as a parameter in url
 --

 Key: CASSANDRA-7410
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7410
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Reporter: Alex Liu
Assignee: Alex Liu
Priority: Minor
 Fix For: 2.0.15

 Attachments: 7410-2.0-branch.txt, 7410-2.1-branch.txt, 
 7410-v2-2.0-branch.txt, 7410-v3-2.0-branch.txt, 
 CASSANDRA-7410-v2-2.1-branch.txt, CASSANDRA-7410-v3-2.1-branch.txt, 
 CASSANDRA-7410-v4-2.0-branch.txt, CASSANDRA-7410-v4-2.1-branch.txt, 
 CASSANDRA-7410-v5-2.0-branch.txt


 Add BulkOutputFormat support in Pig url



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7410) Pig support for BulkOutputFormat as a parameter in url

2015-04-16 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498218#comment-14498218
 ] 

Alex Liu commented on CASSANDRA-7410:
-

Any reason for not using super.setStoreLocation()?
--
We will remove CqlStorage class soon, so try to not couple with it any more.

 Pig support for BulkOutputFormat as a parameter in url
 --

 Key: CASSANDRA-7410
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7410
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Reporter: Alex Liu
Assignee: Alex Liu
Priority: Minor
 Fix For: 2.0.15

 Attachments: 7410-2.0-branch.txt, 7410-2.1-branch.txt, 
 7410-v2-2.0-branch.txt, 7410-v3-2.0-branch.txt, 
 CASSANDRA-7410-v2-2.1-branch.txt


 Add BulkOutputFormat support in Pig url



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-7410) Pig support for BulkOutputFormat as a parameter in url

2015-04-16 Thread Alex Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Liu updated CASSANDRA-7410:

Attachment: CASSANDRA-7410-v4-2.0-branch.txt
CASSANDRA-7410-v3-2.1-branch.txt

V3 on 2.0 and v4 on 2.1 branches are attached to address the comments and 
update code to latest branch

 Pig support for BulkOutputFormat as a parameter in url
 --

 Key: CASSANDRA-7410
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7410
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Reporter: Alex Liu
Assignee: Alex Liu
Priority: Minor
 Fix For: 2.0.15

 Attachments: 7410-2.0-branch.txt, 7410-2.1-branch.txt, 
 7410-v2-2.0-branch.txt, 7410-v3-2.0-branch.txt, 
 CASSANDRA-7410-v2-2.1-branch.txt, CASSANDRA-7410-v3-2.1-branch.txt, 
 CASSANDRA-7410-v4-2.0-branch.txt


 Add BulkOutputFormat support in Pig url



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7410) Pig support for BulkOutputFormat as a parameter in url

2015-04-16 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498768#comment-14498768
 ] 

Alex Liu commented on CASSANDRA-7410:
-

This change is no longer need, because I port the patch on 2.1 back to 2.0 
which uses BulkLoader.ExternalClient .

 Pig support for BulkOutputFormat as a parameter in url
 --

 Key: CASSANDRA-7410
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7410
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Reporter: Alex Liu
Assignee: Alex Liu
Priority: Minor
 Fix For: 2.0.15

 Attachments: 7410-2.0-branch.txt, 7410-2.1-branch.txt, 
 7410-v2-2.0-branch.txt, 7410-v3-2.0-branch.txt, 
 CASSANDRA-7410-v2-2.1-branch.txt


 Add BulkOutputFormat support in Pig url



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-6348) TimeoutException throws if Cql query allows data filtering and index is too big and it can't find the data in base CF after filtering

2015-04-15 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496805#comment-14496805
 ] 

Alex Liu commented on CASSANDRA-6348:
-

Hive has a setting to enable pushdown, by default it's disable. User can enable 
it if the table has only one indexed column. 


 TimeoutException throws if Cql query allows data filtering and index is too 
 big and it can't find the data in base CF after filtering 
 --

 Key: CASSANDRA-6348
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6348
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Alex Liu
Assignee: Alex Liu
 Attachments: 6348.txt


 If index row is too big, and filtering can't find the match Cql row in base 
 CF, it keep scanning the index row and retrieving base CF until the index row 
 is scanned completely which may take too long and thrift server returns 
 TimeoutException. This is one of the reasons why we shouldn't index a column 
 if the index is too big.
 Multiple indexes merging can resolve the case where there are only EQUAL 
 clauses. (CASSANDRA-6048 addresses it).
 If the query has none-EQUAL clauses, we still need do data filtering which 
 might lead to timeout exception.
 We can either disable those kind of queries or WARN the user that data 
 filtering might lead to timeout exception or OOM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

2015-04-13 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14493117#comment-14493117
 ] 

Alex Liu commented on CASSANDRA-8576:
-

It's been this way for very beginning. Internally, url decoding is used. I 
think it's not an easy way around here.

 Primary Key Pushdown For Hadoop
 ---

 Key: CASSANDRA-8576
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Reporter: Russell Alexander Spitzer
Assignee: Alex Liu
 Fix For: 2.1.5

 Attachments: 8576-2.1-branch.txt, 8576-trunk.txt


 I've heard reports from several users that they would like to have predicate 
 pushdown functionality for hadoop (Hive in particular) based services. 
 Example usecase
 Table with wide partitions, one per customer
 Application team has HQL they would like to run on a single customer
 Currently time to complete scales with number of customers since Input Format 
 can't pushdown primary key predicate
 Current implementation requires a full table scan (since it can't recognize 
 that a single partition was specified)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

2015-04-13 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14493123#comment-14493123
 ] 

Alex Liu commented on CASSANDRA-8576:
-

ome one from Product Management should be able to answer it.

 Primary Key Pushdown For Hadoop
 ---

 Key: CASSANDRA-8576
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Reporter: Russell Alexander Spitzer
Assignee: Alex Liu
 Fix For: 2.1.5

 Attachments: 8576-2.1-branch.txt, 8576-trunk.txt


 I've heard reports from several users that they would like to have predicate 
 pushdown functionality for hadoop (Hive in particular) based services. 
 Example usecase
 Table with wide partitions, one per customer
 Application team has HQL they would like to run on a single customer
 Currently time to complete scales with number of customers since Input Format 
 can't pushdown primary key predicate
 Current implementation requires a full table scan (since it can't recognize 
 that a single partition was specified)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

2015-04-13 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14493122#comment-14493122
 ] 

Alex Liu commented on CASSANDRA-8576:
-

Some one from Product Management should be able to answer it. 

 Primary Key Pushdown For Hadoop
 ---

 Key: CASSANDRA-8576
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Reporter: Russell Alexander Spitzer
Assignee: Alex Liu
 Fix For: 2.1.5

 Attachments: 8576-2.1-branch.txt, 8576-trunk.txt


 I've heard reports from several users that they would like to have predicate 
 pushdown functionality for hadoop (Hive in particular) based services. 
 Example usecase
 Table with wide partitions, one per customer
 Application team has HQL they would like to run on a single customer
 Currently time to complete scales with number of customers since Input Format 
 can't pushdown primary key predicate
 Current implementation requires a full table scan (since it can't recognize 
 that a single partition was specified)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9074) Hadoop Cassandra CqlInputFormat pagination - not reading all input rows

2015-04-02 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393027#comment-14393027
 ] 

Alex Liu commented on CASSANDRA-9074:
-

Can you provide detail how to reproduce the issue like. Table schema, data and 
Hadoop query ... etc, so we can reproduce it and debug it. Does it error out in 
a one node cluster?

 Hadoop Cassandra CqlInputFormat pagination - not reading all input rows
 ---

 Key: CASSANDRA-9074
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9074
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
 Environment: Cassandra 2.0.11, Hadoop 1.0.4, Datastax java 
 cassandra-driver-core 2.1.4
Reporter: fuggy_yama
Assignee: Alex Liu
Priority: Minor
 Fix For: 2.0.15


 I have a 7-node Cassandra (v2.0.11) cluster and a table with 10k rows. I run 
 a hadoop job (datanodes reside on cassandra nodes of course) that reads data 
 from that table and I see that only 7k rows is read to map phase.
 I checked CqlInputFormat source code and noticed that a CQL query is build to 
 select node-local date and also LIMIT clause is added (1k default). So that 
 7k read rows can be explained:
 7 nodes * 1k limit = 7k rows read total
 The limit can be changed using CqlConfigHelper:
 CqlConfigHelper.setInputCQLPageRowSize(job.getConfiguration(), 1000);
 Please help me with questions below: 
 Is this a desired behavior? 
 Why CqlInputFormat does not page through the rest of rows? 
 Is it a bug or should I just increase the InputCQLPageRowSize value? 
 What if I want to read all data in table and do not know the row count?
 What if the amount of rows I need to read per cassandra node is very large - 
 in other words how to avoid OOM when setting InputCQLPageRowSize very large 
 to handle all data?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Issue Comment Deleted] (CASSANDRA-9074) Hadoop Cassandra CqlInputFormat pagination - not reading all input rows

2015-04-02 Thread Alex Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Liu updated CASSANDRA-9074:

Comment: was deleted

(was: Can you provide detail how to reproduce the issue like. Table schema, 
data and Hadoop query ... etc, so we can reproduce it and debug it. Does it 
error out in a one node cluster?)

 Hadoop Cassandra CqlInputFormat pagination - not reading all input rows
 ---

 Key: CASSANDRA-9074
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9074
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
 Environment: Cassandra 2.0.11, Hadoop 1.0.4, Datastax java 
 cassandra-driver-core 2.1.4
Reporter: fuggy_yama
Assignee: Alex Liu
Priority: Minor
 Fix For: 2.0.15


 I have a 7-node Cassandra (v2.0.11) cluster and a table with 10k rows. I run 
 a hadoop job (datanodes reside on cassandra nodes of course) that reads data 
 from that table and I see that only 7k rows is read to map phase.
 I checked CqlInputFormat source code and noticed that a CQL query is build to 
 select node-local date and also LIMIT clause is added (1k default). So that 
 7k read rows can be explained:
 7 nodes * 1k limit = 7k rows read total
 The limit can be changed using CqlConfigHelper:
 CqlConfigHelper.setInputCQLPageRowSize(job.getConfiguration(), 1000);
 Please help me with questions below: 
 Is this a desired behavior? 
 Why CqlInputFormat does not page through the rest of rows? 
 Is it a bug or should I just increase the InputCQLPageRowSize value? 
 What if I want to read all data in table and do not know the row count?
 What if the amount of rows I need to read per cassandra node is very large - 
 in other words how to avoid OOM when setting InputCQLPageRowSize very large 
 to handle all data?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

2015-04-01 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391765#comment-14391765
 ] 

Alex Liu commented on CASSANDRA-8576:
-

pending on CASSANDRA-8358 

 Primary Key Pushdown For Hadoop
 ---

 Key: CASSANDRA-8576
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Reporter: Russell Alexander Spitzer
Assignee: Alex Liu
 Fix For: 2.1.5

 Attachments: 8576-2.1-branch.txt, 8576-trunk.txt


 I've heard reports from several users that they would like to have predicate 
 pushdown functionality for hadoop (Hive in particular) based services. 
 Example usecase
 Table with wide partitions, one per customer
 Application team has HQL they would like to run on a single customer
 Currently time to complete scales with number of customers since Input Format 
 can't pushdown primary key predicate
 Current implementation requires a full table scan (since it can't recognize 
 that a single partition was specified)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9074) Hadoop Cassandra CqlInputFormat pagination - not reading all input rows

2015-03-31 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388706#comment-14388706
 ] 

Alex Liu commented on CASSANDRA-9074:
-

yes, please test it on the latest branch.

 Hadoop Cassandra CqlInputFormat pagination - not reading all input rows
 ---

 Key: CASSANDRA-9074
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9074
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
 Environment: Cassandra 2.0.11, Hadoop 1.0.4, Datastax java 
 cassandra-driver-core 2.1.4
Reporter: fuggy_yama
Assignee: Alex Liu
Priority: Minor
 Fix For: 2.0.14


 I have a 7-node Cassandra (v2.0.11) cluster and a table with 10k rows. I run 
 a hadoop job (datanodes reside on cassandra nodes of course) that reads data 
 from that table and I see that only 7k rows is read to map phase.
 I checked CqlInputFormat source code and noticed that a CQL query is build to 
 select node-local date and also LIMIT clause is added (1k default). So that 
 7k read rows can be explained:
 7 nodes * 1k limit = 7k rows read total
 The limit can be changed using CqlConfigHelper:
 CqlConfigHelper.setInputCQLPageRowSize(job.getConfiguration(), 1000);
 Please help me with questions below: 
 Is this a desired behavior? 
 Why CqlInputFormat does not page through the rest of rows? 
 Is it a bug or should I just increase the InputCQLPageRowSize value? 
 What if I want to read all data in table and do not know the row count?
 What if the amount of rows I need to read per cassandra node is very large - 
 in other words how to avoid OOM when setting InputCQLPageRowSize very large 
 to handle all data?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9074) Hadoop Cassandra CqlInputFormat pagination - not reading all input rows

2015-03-30 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387392#comment-14387392
 ] 

Alex Liu commented on CASSANDRA-9074:
-

Please try the latest 2.1.x or 2.0.x branch, it should have been fixed. 

 Hadoop Cassandra CqlInputFormat pagination - not reading all input rows
 ---

 Key: CASSANDRA-9074
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9074
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
 Environment: Cassandra 2.0.11, Hadoop 1.0.4, Datastax java 
 cassandra-driver-core 2.1.4
Reporter: fuggy_yama
Assignee: Alex Liu
Priority: Minor
 Fix For: 2.0.14


 I have a 7-node Cassandra (v2.0.11) cluster and a table with 10k rows. I run 
 a hadoop job (datanodes reside on cassandra nodes of course) that reads data 
 from that table and I see that only 7k rows is read to map phase.
 I checked CqlInputFormat source code and noticed that a CQL query is build to 
 select node-local date and also LIMIT clause is added (1k default). So that 
 7k read rows can be explained:
 7 nodes * 1k limit = 7k rows read total
 The limit can be changed using CqlConfigHelper:
 CqlConfigHelper.setInputCQLPageRowSize(job.getConfiguration(), 1000);
 Please help me with questions below: 
 Is this a desired behavior? 
 Why CqlInputFormat does not page through the rest of rows? 
 Is it a bug or should I just increase the InputCQLPageRowSize value? 
 What if I want to read all data in table and do not know the row count?
 What if the amount of rows I need to read per cassandra node is very large - 
 in other words how to avoid OOM when setting InputCQLPageRowSize very large 
 to handle all data?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9074) Hadoop Cassandra CqlInputFormat pagination - not reading all input rows

2015-03-30 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387608#comment-14387608
 ] 

Alex Liu commented on CASSANDRA-9074:
-

CASSANDRA-8166

 Hadoop Cassandra CqlInputFormat pagination - not reading all input rows
 ---

 Key: CASSANDRA-9074
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9074
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
 Environment: Cassandra 2.0.11, Hadoop 1.0.4, Datastax java 
 cassandra-driver-core 2.1.4
Reporter: fuggy_yama
Assignee: Alex Liu
Priority: Minor
 Fix For: 2.0.14


 I have a 7-node Cassandra (v2.0.11) cluster and a table with 10k rows. I run 
 a hadoop job (datanodes reside on cassandra nodes of course) that reads data 
 from that table and I see that only 7k rows is read to map phase.
 I checked CqlInputFormat source code and noticed that a CQL query is build to 
 select node-local date and also LIMIT clause is added (1k default). So that 
 7k read rows can be explained:
 7 nodes * 1k limit = 7k rows read total
 The limit can be changed using CqlConfigHelper:
 CqlConfigHelper.setInputCQLPageRowSize(job.getConfiguration(), 1000);
 Please help me with questions below: 
 Is this a desired behavior? 
 Why CqlInputFormat does not page through the rest of rows? 
 Is it a bug or should I just increase the InputCQLPageRowSize value? 
 What if I want to read all data in table and do not know the row count?
 What if the amount of rows I need to read per cassandra node is very large - 
 in other words how to avoid OOM when setting InputCQLPageRowSize very large 
 to handle all data?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-6432) Calculate estimated Cql row count per token range

2015-03-18 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368292#comment-14368292
 ] 

Alex Liu commented on CASSANDRA-6432:
-

This ticket is for count of the rows returned by Java driver  per token range. 
Is CASSANDRA-7688 for that? 

 Calculate estimated Cql row count per token range
 -

 Key: CASSANDRA-6432
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6432
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Reporter: Alex Liu
 Fix For: 2.0.14


 CASSANDRA-6311 use the client side to calculate actual CF row count for 
 hadoop job. We need fix it by using Cql row count, which need estimated Cql 
 row count per token range.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8358) Bundled tools shouldn't be using Thrift API

2015-03-17 Thread Alex Liu (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-8358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365885#comment-14365885
]

Alex Liu commented on CASSANDRA-8358:
-

When can this ticket committed to 2.1 branch?

Bundled tools shouldn't be using Thrift API
---

Key: CASSANDRA-8358
URL: https://issues.apache.org/jira/browse/CASSANDRA-8358
Project: Cassandra
Issue Type: Improvement
Reporter: Aleksey Yeschenko
Assignee: Philip Thompson
Fix For: 3.0

In 2.1, we switched cqlsh to the python-driver.
In 3.0, we got rid of cassandra-cli.
Yet there is still code that's using legacy Thrift API. We want to convert it
all to use the java-driver instead.
1. BulkLoader uses Thrift to query the schema tables. It should be using
java-driver metadata APIs directly instead.
2. o.a.c.hadoop.cql3.CqlRecordWriter is using Thrift
3. o.a.c.hadoop.ColumnFamilyRecordReader is using Thrift
4. o.a.c.hadoop.AbstractCassandraStorage is using Thrift
5. o.a.c.hadoop.pig.CqlStorage is using Thrift
Some of the things listed above use Thrift to get the list of partition key
columns or clustering columns. Those should be converted to use the Metadata
API of the java-driver.
Somewhat related to that, we also have badly ported code from Thrift in
o.a.c.hadoop.cql3.CqlRecordReader (see fetchKeys()) that manually fetches
columns from schema tables instead of properly using the driver's Metadata
API.
We need all of it fixed. One exception, for now, is
o.a.c.hadoop.AbstractColumnFamilyInputFormat - it's using Thrift for its
describe_splits_ex() call that cannot be currently replaced by any
java-driver call (?).
Once this is done, we can stop starting Thrift RPC port by default in
cassandra.yaml.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Issue Comment Deleted] (CASSANDRA-7410) Pig support for BulkOutputFormat as a parameter in url

2015-03-17 Thread Alex Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Liu updated CASSANDRA-7410:

Comment: was deleted

(was: Waiting for CASSANDRA-8358)

 Pig support for BulkOutputFormat as a parameter in url
 --

 Key: CASSANDRA-7410
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7410
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Reporter: Alex Liu
Assignee: Alex Liu
Priority: Minor
 Fix For: 2.0.14

 Attachments: 7410-2.0-branch.txt, 7410-2.1-branch.txt, 
 7410-v2-2.0-branch.txt, 7410-v3-2.0-branch.txt, 
 CASSANDRA-7410-v2-2.1-branch.txt


 Add BulkOutputFormat support in Pig url



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7410) Pig support for BulkOutputFormat as a parameter in url

2015-03-17 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365910#comment-14365910
 ] 

Alex Liu commented on CASSANDRA-7410:
-

[~brandon.williams] Do you have time to review it?

 Pig support for BulkOutputFormat as a parameter in url
 --

 Key: CASSANDRA-7410
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7410
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Reporter: Alex Liu
Assignee: Alex Liu
Priority: Minor
 Fix For: 2.0.14

 Attachments: 7410-2.0-branch.txt, 7410-2.1-branch.txt, 
 7410-v2-2.0-branch.txt, 7410-v3-2.0-branch.txt, 
 CASSANDRA-7410-v2-2.1-branch.txt


 Add BulkOutputFormat support in Pig url



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7410) Pig support for BulkOutputFormat as a parameter in url

2015-03-17 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365884#comment-14365884
 ] 

Alex Liu commented on CASSANDRA-7410:
-

Waiting for CASSANDRA-8358

 Pig support for BulkOutputFormat as a parameter in url
 --

 Key: CASSANDRA-7410
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7410
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Reporter: Alex Liu
Assignee: Alex Liu
Priority: Minor
 Fix For: 2.0.14

 Attachments: 7410-2.0-branch.txt, 7410-2.1-branch.txt, 
 7410-v2-2.0-branch.txt, 7410-v3-2.0-branch.txt, 
 CASSANDRA-7410-v2-2.1-branch.txt


 Add BulkOutputFormat support in Pig url



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8924) Streaming failures during bulkloading data using CqlBulkOutputFormat

2015-03-10 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14355963#comment-14355963
 ] 

Alex Liu commented on CASSANDRA-8924:
-

It's fixed in CASSANDRA-7410 patch, but it's waiting for pig test fix.

 Streaming failures during bulkloading data using CqlBulkOutputFormat
 

 Key: CASSANDRA-8924
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8924
 Project: Cassandra
  Issue Type: Bug
  Components: Core, Hadoop
 Environment: 1 node, Ubuntu 14.04 , vnodes enabled, Oracle JVM, Java 
 1.7.0_51, Cassandra 2.1.3
Reporter: Aby Kuruvilla
  Labels: BulkLoading, CqlBulkOutputFormat, Hadoop, Streaming
 Attachments: 0001-use-cql-to-fetch-CFMetaData.patch


 I am trying to use the CqlBulkOutputFormat in a Hadoop job to bulk load data 
 into Cassandra.   On running the Hadoop job, I can see that the SSTable files 
 do get generated but fails to stream the data out. I get the same exception 
 when I try with Cassandra node on localhost as well as a remote Cassandra 
 node. Also I get this exception on C* versions 2.1.1,  2.1.2 and 2.1.3.
 Relevant portion of logs and stack trace
 {noformat}
 09:20:23.207 [Thread-6] WARN  org.apache.cassandra.utils.CLibrary - JNA link 
 failure, one or more native method will be unavailable.
 09:20:23.208 [Thread-6] DEBUG org.apache.cassandra.utils.CLibrary - JNA link 
 failure details: Error looking up function 'posix_fadvise': 
 dlsym(0x7fff6ab8a5e0, posix_fadvise): symbol not found
 09:20:23.504 [Thread-6] DEBUG o.apache.cassandra.io.util.FileUtils - Renaming 
 /var/folders/bb/c4416mx95xsbb11jx5g5zq15ddhhh4/T/dev/participant-262ce044-0a2d-48f4-9baa-ad4d626e743a/dev-participant-tmp-ka-1-Filter.db
  to 
 /var/folders/bb/c4416mx95xsbb11jx5g5zq15ddhhh4/T/dev/participant-262ce044-0a2d-48f4-9baa-ad4d626e743a/dev-participant-ka-1-Filter.db
 09:20:23.505 [Thread-6] DEBUG o.apache.cassandra.io.util.FileUtils - Renaming 
 /var/folders/bb/c4416mx95xsbb11jx5g5zq15ddhhh4/T/dev/participant-262ce044-0a2d-48f4-9baa-ad4d626e743a/dev-participant-tmp-ka-1-Digest.sha1
  to 
 /var/folders/bb/c4416mx95xsbb11jx5g5zq15ddhhh4/T/dev/participant-262ce044-0a2d-48f4-9baa-ad4d626e743a/dev-participant-ka-1-Digest.sha1
 09:20:23.505 [Thread-6] DEBUG o.apache.cassandra.io.util.FileUtils - Renaming 
 /var/folders/bb/c4416mx95xsbb11jx5g5zq15ddhhh4/T/dev/participant-262ce044-0a2d-48f4-9baa-ad4d626e743a/dev-participant-tmp-ka-1-Statistics.db
  to 
 /var/folders/bb/c4416mx95xsbb11jx5g5zq15ddhhh4/T/dev/participant-262ce044-0a2d-48f4-9baa-ad4d626e743a/dev-participant-ka-1-Statistics.db
 09:20:23.505 [Thread-6] DEBUG o.apache.cassandra.io.util.FileUtils - Renaming 
 /var/folders/bb/c4416mx95xsbb11jx5g5zq15ddhhh4/T/dev/participant-262ce044-0a2d-48f4-9baa-ad4d626e743a/dev-participant-tmp-ka-1-Index.db
  to 
 /var/folders/bb/c4416mx95xsbb11jx5g5zq15ddhhh4/T/dev/participant-262ce044-0a2d-48f4-9baa-ad4d626e743a/dev-participant-ka-1-Index.db
 09:20:23.505 [Thread-6] DEBUG o.apache.cassandra.io.util.FileUtils - Renaming 
 /var/folders/bb/c4416mx95xsbb11jx5g5zq15ddhhh4/T/dev/participant-262ce044-0a2d-48f4-9baa-ad4d626e743a/dev-participant-tmp-ka-1-CompressionInfo.db
  to 
 /var/folders/bb/c4416mx95xsbb11jx5g5zq15ddhhh4/T/dev/participant-262ce044-0a2d-48f4-9baa-ad4d626e743a/dev-participant-ka-1-CompressionInfo.db
 09:20:23.506 [Thread-6] DEBUG o.apache.cassandra.io.util.FileUtils - Renaming 
 /var/folders/bb/c4416mx95xsbb11jx5g5zq15ddhhh4/T/dev/participant-262ce044-0a2d-48f4-9baa-ad4d626e743a/dev-participant-tmp-ka-1-TOC.txt
  to 
 /var/folders/bb/c4416mx95xsbb11jx5g5zq15ddhhh4/T/dev/participant-262ce044-0a2d-48f4-9baa-ad4d626e743a/dev-participant-ka-1-TOC.txt
 09:20:23.506 [Thread-6] DEBUG o.apache.cassandra.io.util.FileUtils - Renaming 
 /var/folders/bb/c4416mx95xsbb11jx5g5zq15ddhhh4/T/dev/participant-262ce044-0a2d-48f4-9baa-ad4d626e743a/dev-participant-tmp-ka-1-Data.db
  to 
 /var/folders/bb/c4416mx95xsbb11jx5g5zq15ddhhh4/T/dev/participant-262ce044-0a2d-48f4-9baa-ad4d626e743a/dev-participant-ka-1-Data.db
 09:20:23.727 [Thread-2] DEBUG o.a.c.i.s.m.MetadataSerializer - Load metadata 
 for 
 /var/folders/bb/c4416mx95xsbb11jx5g5zq15ddhhh4/T/dev/participant-262ce044-0a2d-48f4-9baa-ad4d626e743a/dev-participant-ka-1
 09:20:23.729 [Thread-2] INFO  o.a.c.io.sstable.SSTableReader - Opening 
 /var/folders/bb/c4416mx95xsbb11jx5g5zq15ddhhh4/T/dev/participant-262ce044-0a2d-48f4-9baa-ad4d626e743a/dev-participant-ka-1
  (617874 bytes)
 09:20:23.780 [Thread-2] INFO  o.a.c.streaming.StreamResultFuture - [Stream 
 #98ba8730-c279-11e4-b8e9-55374d280508] Executing streaming plan for Bulk Load
 09:20:23.781 [StreamConnectionEstablisher:1] INFO  
 o.a.c.streaming.StreamSession - [Stream 
 #98ba8730-c279-11e4-b8e9-55374d280508] Starting streaming to

[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

2015-03-05 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14349527#comment-14349527
 ] 

Alex Liu commented on CASSANDRA-8576:
-

Pig-test on trunk fails, Philip Thompson is fixing it. I attach the patch on 
trunk, but we need merge it with Philip Thompson's fix.

 Primary Key Pushdown For Hadoop
 ---

 Key: CASSANDRA-8576
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Reporter: Russell Alexander Spitzer
Assignee: Alex Liu
 Fix For: 2.1.4

 Attachments: 8576-2.1-branch.txt


 I've heard reports from several users that they would like to have predicate 
 pushdown functionality for hadoop (Hive in particular) based services. 
 Example usecase
 Table with wide partitions, one per customer
 Application team has HQL they would like to run on a single customer
 Currently time to complete scales with number of customers since Input Format 
 can't pushdown primary key predicate
 Current implementation requires a full table scan (since it can't recognize 
 that a single partition was specified)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

2015-03-05 Thread Alex Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Liu updated CASSANDRA-8576:

Attachment: 8576-trunk.txt

 Primary Key Pushdown For Hadoop
 ---

 Key: CASSANDRA-8576
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Reporter: Russell Alexander Spitzer
Assignee: Alex Liu
 Fix For: 2.1.4

 Attachments: 8576-2.1-branch.txt, 8576-trunk.txt


 I've heard reports from several users that they would like to have predicate 
 pushdown functionality for hadoop (Hive in particular) based services. 
 Example usecase
 Table with wide partitions, one per customer
 Application team has HQL they would like to run on a single customer
 Currently time to complete scales with number of customers since Input Format 
 can't pushdown primary key predicate
 Current implementation requires a full table scan (since it can't recognize 
 that a single partition was specified)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-7410) Pig support for BulkOutputFormat as a parameter in url

2015-03-05 Thread Alex Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Liu updated CASSANDRA-7410:

Attachment: CASSANDRA-7410-v2-2.1-branch.txt

V2 version fixes the streaming error 

 Pig support for BulkOutputFormat as a parameter in url
 --

 Key: CASSANDRA-7410
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7410
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Reporter: Alex Liu
Assignee: Alex Liu
Priority: Minor
 Fix For: 2.0.13

 Attachments: 7410-2.0-branch.txt, 7410-2.1-branch.txt, 
 7410-v2-2.0-branch.txt, 7410-v3-2.0-branch.txt, 
 CASSANDRA-7410-v2-2.1-branch.txt


 Add BulkOutputFormat support in Pig url



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-6091) Better Vnode support in hadoop/pig

2015-03-05 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14349029#comment-14349029
 ] 

Alex Liu commented on CASSANDRA-6091:
-

yes, please add  cassandra-2.1 as well. If we can cleanly merge it into trunk, 
we need another one for trunk.

 Better Vnode support in hadoop/pig
 --

 Key: CASSANDRA-6091
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6091
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Reporter: Alex Liu
Assignee: mck
 Attachments: cassandra-2.0-6091.txt


 CASSANDRA-6084 shows there are some issues during running hadoop/pig job if 
 vnodes are enable. Also the hadoop performance of vnode enabled nodes  are 
 bad for there are so many splits.
 The idea is to combine vnode splits into a big sudo splits so it work like 
 vnode is disable for hadoop/pig job



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-6091) Better Vnode support in hadoop/pig

2015-03-04 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347173#comment-14347173
 ] 

Alex Liu commented on CASSANDRA-6091:
-

LGMT+1

 Better Vnode support in hadoop/pig
 --

 Key: CASSANDRA-6091
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6091
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Reporter: Alex Liu
Assignee: mck
 Attachments: cassandra-2.0-6091.txt


 CASSANDRA-6084 shows there are some issues during running hadoop/pig job if 
 vnodes are enable. Also the hadoop performance of vnode enabled nodes  are 
 bad for there are so many splits.
 The idea is to combine vnode splits into a big sudo splits so it work like 
 vnode is disable for hadoop/pig job



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7410) Pig support for BulkOutputFormat as a parameter in url

2015-03-04 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347505#comment-14347505
 ] 

Alex Liu commented on CASSANDRA-7410:
-

 {{CqlBulkRecordWriter}} comes with its own {{ExternalClient}} which uses a 
wrong approach for generating CFMetaData (and cfId). We can fix the issue  by 
using the approach found in {{BulkLoader.ExternalClient}} which fetches 
CFMetadata from the cluster using CQL.

 Pig support for BulkOutputFormat as a parameter in url
 --

 Key: CASSANDRA-7410
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7410
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Reporter: Alex Liu
Assignee: Alex Liu
Priority: Minor
 Fix For: 2.0.13

 Attachments: 7410-2.0-branch.txt, 7410-2.1-branch.txt, 
 7410-v2-2.0-branch.txt, 7410-v3-2.0-branch.txt


 Add BulkOutputFormat support in Pig url



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

2015-02-20 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277925#comment-14277925
 ] 

Alex Liu edited comment on CASSANDRA-8576 at 2/20/15 5:04 PM:
--

v1 patch is attached to only support full partition key EQ queries.

To test it, e.g. table with key1, and key2 partition columns

{code}
set where clause as key1 = 'key1' and key2 = 111 and column1=100 in pig url

input_cql=select%20*%20from%20compositekeytable%20where%20key1%20%3D%20%27key1%27%20and%20key2%20%3D%20111%20and%20column1%3D100
{code}


was (Author: alexliu68):
v1 patch is attached to only support full partition key EQ queries.

To test it, e.g. table with key1, and key2 partition columns

{code}
set where clause as key1 = 'key1' and key2 = 111 and column1=100 in pig url
{code}

 Primary Key Pushdown For Hadoop
 ---

 Key: CASSANDRA-8576
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Reporter: Russell Alexander Spitzer
Assignee: Alex Liu
 Fix For: 2.1.4

 Attachments: 8576-2.1-branch.txt


 I've heard reports from several users that they would like to have predicate 
 pushdown functionality for hadoop (Hive in particular) based services. 
 Example usecase
 Table with wide partitions, one per customer
 Application team has HQL they would like to run on a single customer
 Currently time to complete scales with number of customers since Input Format 
 can't pushdown primary key predicate
 Current implementation requires a full table scan (since it can't recognize 
 that a single partition was specified)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

2015-02-20 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329142#comment-14329142
 ] 

Alex Liu commented on CASSANDRA-8576:
-

[~brandon.williams] Do u have time to review this ticket? 

 Primary Key Pushdown For Hadoop
 ---

 Key: CASSANDRA-8576
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Reporter: Russell Alexander Spitzer
Assignee: Alex Liu
 Fix For: 2.1.4

 Attachments: 8576-2.1-branch.txt


 I've heard reports from several users that they would like to have predicate 
 pushdown functionality for hadoop (Hive in particular) based services. 
 Example usecase
 Table with wide partitions, one per customer
 Application team has HQL they would like to run on a single customer
 Currently time to complete scales with number of customers since Input Format 
 can't pushdown primary key predicate
 Current implementation requires a full table scan (since it can't recognize 
 that a single partition was specified)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

2015-02-20 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277925#comment-14277925
 ] 

Alex Liu edited comment on CASSANDRA-8576 at 2/20/15 5:03 PM:
--

v1 patch is attached to only support full partition key EQ queries.

To test it, e.g. table with key1, and key2 partition columns

{code}
set where clause as key1 = 'key1' and key2 = 111 and column1=100 in pig url
{code}


was (Author: alexliu68):
v1 patch is attached to only support full partition key EQ queries.

 Primary Key Pushdown For Hadoop
 ---

 Key: CASSANDRA-8576
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Reporter: Russell Alexander Spitzer
Assignee: Alex Liu
 Fix For: 2.1.4

 Attachments: 8576-2.1-branch.txt


 I've heard reports from several users that they would like to have predicate 
 pushdown functionality for hadoop (Hive in particular) based services. 
 Example usecase
 Table with wide partitions, one per customer
 Application team has HQL they would like to run on a single customer
 Currently time to complete scales with number of customers since Input Format 
 can't pushdown primary key predicate
 Current implementation requires a full table scan (since it can't recognize 
 that a single partition was specified)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8774) BulkOutputFormat never completes if streaming have errors

2015-02-12 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14318684#comment-14318684
 ] 

Alex Liu commented on CASSANDRA-8774:
-

Why not handle ExecutorException errors as a whole if there are no other 
none-critical root causes?

 BulkOutputFormat never completes if streaming have errors
 -

 Key: CASSANDRA-8774
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8774
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Reporter: Erik Forsberg
 Fix For: 2.0.13

 Attachments: 
 0001-CASSANDRA-8774-Handle-StreamException-when-bulkloadi.patch


 With BulkoutputFormat in Cassandra 1.2.18, if any streaming errors occured, 
 the hadoop task would fail. This doesn't seem to happen with 2.0.12.
 I have a hadoop map task that use BulkoutputFormat. If one of the cassandra 
 nodes I'm writing to is down, I'm getting the following syslog output from 
 the map task:
 {noformat}
 2015-02-10 10:54:15,162 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded 
 the native-hadoop library
 2015-02-10 10:54:15,601 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
 Initializing JVM Metrics with processName=MAP, sessionId=
 2015-02-10 10:54:15,901 INFO org.apache.hadoop.util.ProcessTree: setsid 
 exited with exit code 0
 2015-02-10 10:54:15,907 INFO org.apache.hadoop.mapred.Task:  Using 
 ResourceCalculatorPlugin : 
 org.apache.hadoop.util.LinuxResourceCalculatorPlugin@4984451e
 2015-02-10 10:54:16,110 INFO org.apache.hadoop.mapred.MapTask: Processing 
 split: 
 hdfs://hdpmt01.osp-hadoop.osa:9000/user/jenkins/syst/5ef13_osp/tvstore/sumcombinations/hourly/2015021002/per_period-5ba2faa4b1e4aa21fa163e82bc46-sumcombinations/0/data/part-00047:0+462
 2015-02-10 10:54:16,739 INFO org.apache.hadoop.io.compress.zlib.ZlibFactory: 
 Successfully loaded  initialized native-zlib library
 2015-02-10 10:54:16,740 INFO org.apache.hadoop.io.compress.CodecPool: Got 
 brand-new decompressor
 2015-02-10 10:54:16,741 INFO org.apache.hadoop.io.compress.CodecPool: Got 
 brand-new decompressor
 2015-02-10 10:54:16,741 INFO org.apache.hadoop.io.compress.CodecPool: Got 
 brand-new decompressor
 2015-02-10 10:54:16,741 INFO org.apache.hadoop.io.compress.CodecPool: Got 
 brand-new decompressor
 2015-02-10 10:54:16,927 ERROR org.apache.cassandra.cql3.QueryProcessor: 
 Unable to initialize MemoryMeter (jamm not specified as javaagent).  This 
 means Cassandra will be unable to measure object sizes accurately and may 
 consequently OOM.
 2015-02-10 10:54:17,780 INFO org.apache.cassandra.utils.CLibrary: JNA not 
 found. Native methods will be disabled.
 2015-02-10 10:54:19,446 INFO org.apache.cassandra.io.sstable.SSTableReader: 
 Opening 
 /opera/log1/hadoop/mapred/local/taskTracker/jenkins/jobcache/job_201502041226_13903/attempt_201502041226_13903_m_00_0/work/tmp/syst5ef13osp/Data_hourly/syst5ef13osp-Data_hourly-jb-1
  (1018 bytes)
 2015-02-10 10:54:20,713 INFO 
 org.apache.cassandra.streaming.StreamResultFuture: [Stream 
 #29f27cd0-b113-11e4-a465-91cc09fc46f1] Executing streaming plan for Bulk Load
 2015-02-10 10:54:20,713 INFO 
 org.apache.cassandra.streaming.StreamResultFuture: [Stream 
 #29f27cd0-b113-11e4-a465-91cc09fc46f1] Beginning stream session with 
 /ipv6:prefix:1:441:0:0:0:7
 2015-02-10 10:54:20,714 INFO 
 org.apache.cassandra.streaming.StreamResultFuture: [Stream 
 #29f27cd0-b113-11e4-a465-91cc09fc46f1] Beginning stream session with 
 /ipv6:prefix:1:441:0:0:0:8
 2015-02-10 10:54:20,715 INFO org.apache.cassandra.streaming.StreamSession: 
 [Stream #29f27cd0-b113-11e4-a465-91cc09fc46f1] Starting streaming to 
 /ipv6:prefix:1:441:0:0:0:7
 2015-02-10 10:54:20,730 INFO 
 org.apache.cassandra.streaming.StreamResultFuture: [Stream 
 #29f27cd0-b113-11e4-a465-91cc09fc46f1] Beginning stream session with 
 /ipv6:prefix:1:441:0:0:0:4
 2015-02-10 10:54:20,750 INFO 
 org.apache.cassandra.streaming.StreamResultFuture: [Stream 
 #29f27cd0-b113-11e4-a465-91cc09fc46f1] Beginning stream session with 
 /ipv6:prefix:1:441:0:0:0:3
 2015-02-10 10:54:20,731 INFO org.apache.cassandra.streaming.StreamSession: 
 [Stream #29f27cd0-b113-11e4-a465-91cc09fc46f1] Starting streaming to 
 /ipv6:prefix:1:441:0:0:0:8
 2015-02-10 10:54:20,750 INFO org.apache.cassandra.streaming.StreamSession: 
 [Stream #29f27cd0-b113-11e4-a465-91cc09fc46f1] Starting streaming to 
 /ipv6:prefix:1:441:0:0:0:4
 2015-02-10 10:54:20,770 INFO 
 org.apache.cassandra.streaming.StreamResultFuture: [Stream 
 #29f27cd0-b113-11e4-a465-91cc09fc46f1] Beginning stream session with 
 /ipv6:prefix:1:441:0:0:0:6
 2015-02-10 10:54:20,778 INFO 
 org.apache.cassandra.streaming.StreamResultFuture: [Stream 
 #29f27cd0-b113-11e4-a465-91cc09fc46f1] Beginning stream session with 
 /ipv6:prefix:1:441:0:0:0:5
 2015-02-10 10:54:20,786 INFO

[jira] [Updated] (CASSANDRA-8609) Remove depency of hadoop to internals (Cell/CellName)

2015-02-12 Thread Alex Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Liu updated CASSANDRA-8609:

Assignee: Philip Thompson  (was: Alex Liu)

 Remove depency of hadoop to internals (Cell/CellName)
 -

 Key: CASSANDRA-8609
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8609
 Project: Cassandra
  Issue Type: Bug
Reporter: Sylvain Lebresne
Assignee: Philip Thompson
 Fix For: 3.0

 Attachments: CASSANDRA-8609-3.0-branch.txt


 For some reason most of the Hadoop code (ColumnFamilyRecordReader, 
 CqlStorage, ...) uses the {{Cell}} and {{CellName}} classes. That dependency 
 is entirely artificial: all this code is really client code that communicate 
 with Cassandra over thrift/native protocol and there is thus no reason for it 
 to use internal classes. And in fact, thoses classes are used in a very crude 
 way, as a {{PairByteBuffer, ByteBuffer}} really.
 But this dependency is really painful when we make changes to the internals. 
 Further, every time we do so, I believe we break some of those the APIs due 
 to the change. This has been painful for CASSANDRA-5417 and this is now 
 painful for CASSANDRA-8099. But while I somewhat hack over it in 
 CASSANDRA-5417, this was a mistake and we should have removed the depency 
 back then. So let do that now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-8609) Remove depency of hadoop to internals (Cell/CellName)

2015-02-12 Thread Alex Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Liu updated CASSANDRA-8609:

Attachment: CASSANDRA-8609-3.0-branch.txt

 Remove depency of hadoop to internals (Cell/CellName)
 -

 Key: CASSANDRA-8609
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8609
 Project: Cassandra
  Issue Type: Bug
Reporter: Sylvain Lebresne
Assignee: Alex Liu
 Fix For: 3.0

 Attachments: CASSANDRA-8609-3.0-branch.txt


 For some reason most of the Hadoop code (ColumnFamilyRecordReader, 
 CqlStorage, ...) uses the {{Cell}} and {{CellName}} classes. That dependency 
 is entirely artificial: all this code is really client code that communicate 
 with Cassandra over thrift/native protocol and there is thus no reason for it 
 to use internal classes. And in fact, thoses classes are used in a very crude 
 way, as a {{PairByteBuffer, ByteBuffer}} really.
 But this dependency is really painful when we make changes to the internals. 
 Further, every time we do so, I believe we break some of those the APIs due 
 to the change. This has been painful for CASSANDRA-5417 and this is now 
 painful for CASSANDRA-8099. But while I somewhat hack over it in 
 CASSANDRA-5417, this was a mistake and we should have removed the depency 
 back then. So let do that now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8609) Remove depency of hadoop to internals (Cell/CellName)

2015-02-12 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14319020#comment-14319020
 ] 

Alex Liu commented on CASSANDRA-8609:
-

I tested pig-test on trunk and found some failed test cases, I am fixing those 
in this ticket as well.

 Remove depency of hadoop to internals (Cell/CellName)
 -

 Key: CASSANDRA-8609
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8609
 Project: Cassandra
  Issue Type: Bug
Reporter: Sylvain Lebresne
Assignee: Alex Liu
 Fix For: 3.0


 For some reason most of the Hadoop code (ColumnFamilyRecordReader, 
 CqlStorage, ...) uses the {{Cell}} and {{CellName}} classes. That dependency 
 is entirely artificial: all this code is really client code that communicate 
 with Cassandra over thrift/native protocol and there is thus no reason for it 
 to use internal classes. And in fact, thoses classes are used in a very crude 
 way, as a {{PairByteBuffer, ByteBuffer}} really.
 But this dependency is really painful when we make changes to the internals. 
 Further, every time we do so, I believe we break some of those the APIs due 
 to the change. This has been painful for CASSANDRA-5417 and this is now 
 painful for CASSANDRA-8099. But while I somewhat hack over it in 
 CASSANDRA-5417, this was a mistake and we should have removed the depency 
 back then. So let do that now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8609) Remove depency of hadoop to internals (Cell/CellName)

2015-02-12 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14319176#comment-14319176
 ] 

Alex Liu commented on CASSANDRA-8609:
-

All pig tests fail, and CASSANDRA-8358 is addressing the issue.  I attach my 
patch on cassandra-3.0 as a reference for [~philipthompson]. He will take over 
this ticket and address it on CASSANDRA-8358.

 Remove depency of hadoop to internals (Cell/CellName)
 -

 Key: CASSANDRA-8609
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8609
 Project: Cassandra
  Issue Type: Bug
Reporter: Sylvain Lebresne
Assignee: Alex Liu
 Fix For: 3.0


 For some reason most of the Hadoop code (ColumnFamilyRecordReader, 
 CqlStorage, ...) uses the {{Cell}} and {{CellName}} classes. That dependency 
 is entirely artificial: all this code is really client code that communicate 
 with Cassandra over thrift/native protocol and there is thus no reason for it 
 to use internal classes. And in fact, thoses classes are used in a very crude 
 way, as a {{PairByteBuffer, ByteBuffer}} really.
 But this dependency is really painful when we make changes to the internals. 
 Further, every time we do so, I believe we break some of those the APIs due 
 to the change. This has been painful for CASSANDRA-5417 and this is now 
 painful for CASSANDRA-8099. But while I somewhat hack over it in 
 CASSANDRA-5417, this was a mistake and we should have removed the depency 
 back then. So let do that now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8609) Remove depency of hadoop to internals (Cell/CellName)

2015-02-12 Thread Alex Liu (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-8609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14318501#comment-14318501
]

Alex Liu commented on CASSANDRA-8609:
-

Sorry, I miss this ticket. I am working on it today or tomorrow to get it done.

Do this ticket only remove Cell and CellName from any of Hadoop related class?

Remove depency of hadoop to internals (Cell/CellName)
-

Key: CASSANDRA-8609
URL: https://issues.apache.org/jira/browse/CASSANDRA-8609
Project: Cassandra
Issue Type: Bug
Reporter: Sylvain Lebresne
Assignee: Alex Liu
Fix For: 3.0

For some reason most of the Hadoop code (ColumnFamilyRecordReader,
CqlStorage, ...) uses the {{Cell}} and {{CellName}} classes. That dependency
is entirely artificial: all this code is really client code that communicate
with Cassandra over thrift/native protocol and there is thus no reason for it
to use internal classes. And in fact, thoses classes are used in a very crude
way, as a {{PairByteBuffer, ByteBuffer}} really.
But this dependency is really painful when we make changes to the internals.
Further, every time we do so, I believe we break some of those the APIs due
to the change. This has been painful for CASSANDRA-5417 and this is now
painful for CASSANDRA-8099. But while I somewhat hack over it in
CASSANDRA-5417, this was a mistake and we should have removed the depency
back then. So let do that now.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-6091) Better Vnode support in hadoop/pig

2015-02-11 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316667#comment-14316667
 ] 

Alex Liu commented on CASSANDRA-6091:
-

We need at least check the total estimated rows of multiple token ranges per 
split, instead of multiple taken ranges per node.

 Better Vnode support in hadoop/pig
 --

 Key: CASSANDRA-6091
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6091
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Reporter: Alex Liu
Assignee: Alex Liu

 CASSANDRA-6084 shows there are some issues during running hadoop/pig job if 
 vnodes are enable. Also the hadoop performance of vnode enabled nodes  are 
 bad for there are so many splits.
 The idea is to combine vnode splits into a big sudo splits so it work like 
 vnode is disable for hadoop/pig job



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-6091) Better Vnode support in hadoop/pig

2015-02-11 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317223#comment-14317223
 ] 

Alex Liu edited comment on CASSANDRA-6091 at 2/11/15 11:44 PM:
---

One more improvement is combining adjacent token ranges into one range. But it 
will create some small corner ranges, so the trade off is not that good as as 
the number of adjacent token ranges is going to drop very quickly with the size 
of the cluster. quoted from Piotr


was (Author: alexliu68):
One more improvement is combining adjacent token ranges into one range. But it 
will create some small corner ranges, so the trade off is not that good as as 
the number of adjacent token ranges is going to drop very quickly with the size 
of the cluster.
Reply

 Better Vnode support in hadoop/pig
 --

 Key: CASSANDRA-6091
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6091
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Reporter: Alex Liu
Assignee: Alex Liu

 CASSANDRA-6084 shows there are some issues during running hadoop/pig job if 
 vnodes are enable. Also the hadoop performance of vnode enabled nodes  are 
 bad for there are so many splits.
 The idea is to combine vnode splits into a big sudo splits so it work like 
 vnode is disable for hadoop/pig job



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-6091) Better Vnode support in hadoop/pig

2015-02-11 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317223#comment-14317223
 ] 

Alex Liu commented on CASSANDRA-6091:
-

One more improvement is combining adjacent token ranges into one range. But it 
will create some small corner ranges, so the trade off is not that good as as 
the number of adjacent token ranges is going to drop very quickly with the size 
of the cluster.
Reply

 Better Vnode support in hadoop/pig
 --

 Key: CASSANDRA-6091
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6091
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Reporter: Alex Liu
Assignee: Alex Liu

 CASSANDRA-6084 shows there are some issues during running hadoop/pig job if 
 vnodes are enable. Also the hadoop performance of vnode enabled nodes  are 
 bad for there are so many splits.
 The idea is to combine vnode splits into a big sudo splits so it work like 
 vnode is disable for hadoop/pig job



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-7410) Pig support for BulkOutputFormat as a parameter in url

2015-01-21 Thread Alex Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Liu updated CASSANDRA-7410:

Attachment: 7410-2.1-branch.txt

patch on cassandra-2.1 is attached. It can be used to debug the streaming error 
as shown above.

 Pig support for BulkOutputFormat as a parameter in url
 --

 Key: CASSANDRA-7410
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7410
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Reporter: Alex Liu
Assignee: Alex Liu
Priority: Minor
 Fix For: 2.0.13

 Attachments: 7410-2.0-branch.txt, 7410-2.1-branch.txt, 
 7410-v2-2.0-branch.txt, 7410-v3-2.0-branch.txt


 Add BulkOutputFormat support in Pig url



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7410) Pig support for BulkOutputFormat as a parameter in url

2015-01-20 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284918#comment-14284918
 ] 

Alex Liu commented on CASSANDRA-7410:
-

I got following sstable loading error after bulk writing.

{code}
DEBUG [Thread-193] 2015-01-20 16:43:53,751 Renaming 
/var/folders/92/cw97kmh10dxf6kj0b4ly1mbhgn/T/cql3ks/test_bulk-46af6a92-6c4c-44be-9b14-d817f7d63174/cql3ks-test_bulk-tmp-ka-1-TOC.txt
 to 
/var/folders/92/cw97kmh10dxf6kj0b4ly1mbhgn/T/cql3ks/test_bulk-46af6a92-6c4c-44be-9b14-d817f7d63174/cql3ks-test_bulk-ka-1-TOC.txt
DEBUG [Thread-193] 2015-01-20 16:43:53,751 Renaming 
/var/folders/92/cw97kmh10dxf6kj0b4ly1mbhgn/T/cql3ks/test_bulk-46af6a92-6c4c-44be-9b14-d817f7d63174/cql3ks-test_bulk-tmp-ka-1-Statistics.db
 to 
/var/folders/92/cw97kmh10dxf6kj0b4ly1mbhgn/T/cql3ks/test_bulk-46af6a92-6c4c-44be-9b14-d817f7d63174/cql3ks-test_bulk-ka-1-Statistics.db
DEBUG [Thread-193] 2015-01-20 16:43:53,752 Renaming 
/var/folders/92/cw97kmh10dxf6kj0b4ly1mbhgn/T/cql3ks/test_bulk-46af6a92-6c4c-44be-9b14-d817f7d63174/cql3ks-test_bulk-tmp-ka-1-Digest.sha1
 to 
/var/folders/92/cw97kmh10dxf6kj0b4ly1mbhgn/T/cql3ks/test_bulk-46af6a92-6c4c-44be-9b14-d817f7d63174/cql3ks-test_bulk-ka-1-Digest.sha1
DEBUG [Thread-193] 2015-01-20 16:43:53,752 Renaming 
/var/folders/92/cw97kmh10dxf6kj0b4ly1mbhgn/T/cql3ks/test_bulk-46af6a92-6c4c-44be-9b14-d817f7d63174/cql3ks-test_bulk-tmp-ka-1-Filter.db
 to 
/var/folders/92/cw97kmh10dxf6kj0b4ly1mbhgn/T/cql3ks/test_bulk-46af6a92-6c4c-44be-9b14-d817f7d63174/cql3ks-test_bulk-ka-1-Filter.db
DEBUG [Thread-193] 2015-01-20 16:43:53,752 Renaming 
/var/folders/92/cw97kmh10dxf6kj0b4ly1mbhgn/T/cql3ks/test_bulk-46af6a92-6c4c-44be-9b14-d817f7d63174/cql3ks-test_bulk-tmp-ka-1-Index.db
 to 
/var/folders/92/cw97kmh10dxf6kj0b4ly1mbhgn/T/cql3ks/test_bulk-46af6a92-6c4c-44be-9b14-d817f7d63174/cql3ks-test_bulk-ka-1-Index.db
DEBUG [Thread-193] 2015-01-20 16:43:53,752 Renaming 
/var/folders/92/cw97kmh10dxf6kj0b4ly1mbhgn/T/cql3ks/test_bulk-46af6a92-6c4c-44be-9b14-d817f7d63174/cql3ks-test_bulk-tmp-ka-1-CompressionInfo.db
 to 
/var/folders/92/cw97kmh10dxf6kj0b4ly1mbhgn/T/cql3ks/test_bulk-46af6a92-6c4c-44be-9b14-d817f7d63174/cql3ks-test_bulk-ka-1-CompressionInfo.db
DEBUG [Thread-193] 2015-01-20 16:43:53,752 Renaming 
/var/folders/92/cw97kmh10dxf6kj0b4ly1mbhgn/T/cql3ks/test_bulk-46af6a92-6c4c-44be-9b14-d817f7d63174/cql3ks-test_bulk-tmp-ka-1-Data.db
 to 
/var/folders/92/cw97kmh10dxf6kj0b4ly1mbhgn/T/cql3ks/test_bulk-46af6a92-6c4c-44be-9b14-d817f7d63174/cql3ks-test_bulk-ka-1-Data.db
DEBUG [Thrift:9] 2015-01-20 16:43:53,758 computing ranges for 
8e56ee4c6ef2c35b3d97c6abeefe8b92
DEBUG [Thread-191] 2015-01-20 16:43:53,786 Load metadata for 
/var/folders/92/cw97kmh10dxf6kj0b4ly1mbhgn/T/cql3ks/test_bulk-46af6a92-6c4c-44be-9b14-d817f7d63174/cql3ks-test_bulk-ka-1
INFO  [Thread-191] 2015-01-20 16:43:53,786 Opening 
/var/folders/92/cw97kmh10dxf6kj0b4ly1mbhgn/T/cql3ks/test_bulk-46af6a92-6c4c-44be-9b14-d817f7d63174/cql3ks-test_bulk-ka-1
 (52 bytes)
INFO  [Thread-191] 2015-01-20 16:43:53,802 [Stream 
#931af290-a106-11e4-ae05-d11cec192498] Executing streaming plan for Bulk Load
INFO  [StreamConnectionEstablisher:1] 2015-01-20 16:43:53,802 [Stream 
#931af290-a106-11e4-ae05-d11cec192498] Starting streaming to /127.0.0.1
DEBUG [StreamConnectionEstablisher:1] 2015-01-20 16:43:53,802 [Stream 
#931af290-a106-11e4-ae05-d11cec192498] Sending stream init for incoming stream
DEBUG [StreamConnectionEstablisher:1] 2015-01-20 16:43:53,805 [Stream 
#931af290-a106-11e4-ae05-d11cec192498] Sending stream init for outgoing stream
DEBUG [ACCEPT-/127.0.0.1] 2015-01-20 16:43:53,805 Connection version 2 from 
/127.0.0.1
DEBUG [ACCEPT-/127.0.0.1] 2015-01-20 16:43:53,806 Connection version 2 from 
/127.0.0.1
INFO  [STREAM-INIT-/127.0.0.1:52052] 2015-01-20 16:43:53,806 [Stream 
#931af290-a106-11e4-ae05-d11cec192498 ID#0] Creating new streaming plan for 
Bulk Load
DEBUG [STREAM-OUT-/127.0.0.1] 2015-01-20 16:43:53,806 [Stream 
#931af290-a106-11e4-ae05-d11cec192498] Sending Prepare (0 requests,  1 files}
INFO  [StreamConnectionEstablisher:1] 2015-01-20 16:43:53,807 [Stream 
#931af290-a106-11e4-ae05-d11cec192498 ID#0] Prepare completed. Receiving 0 
files(0 bytes), sending 1 files(52 bytes)
INFO  [STREAM-INIT-/127.0.0.1:52052] 2015-01-20 16:43:53,807 [Stream 
#931af290-a106-11e4-ae05-d11cec192498, ID#0] Received streaming plan for Bulk 
Load
INFO  [STREAM-INIT-/127.0.0.1:52053] 2015-01-20 16:43:53,808 [Stream 
#931af290-a106-11e4-ae05-d11cec192498, ID#0] Received streaming plan for Bulk 
Load
DEBUG [STREAM-IN-/127.0.0.1] 2015-01-20 16:43:53,808 [Stream 
#931af290-a106-11e4-ae05-d11cec192498] Received Prepare (0 requests,  1 files}
INFO  [STREAM-IN-/127.0.0.1] 2015-01-20 16:43:53,808 [Stream 
#931af290-a106-11e4-ae05-d11cec192498 ID#0] Prepare completed. Receiving 1 
files(52 bytes), sending 0 files(0

[jira] [Issue Comment Deleted] (CASSANDRA-8577) Values of set types not loading correctly into Pig

2015-01-14 Thread Alex Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Liu updated CASSANDRA-8577:

Comment: was deleted

(was: duplicate of CASSANDRA-8577)

 Values of set types not loading correctly into Pig
 --

 Key: CASSANDRA-8577
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8577
 Project: Cassandra
  Issue Type: Bug
Reporter: Oksana Danylyshyn
Assignee: Artem Aliev
 Fix For: 2.1.3

 Attachments: cassandra-2.1-8577.txt


 Values of set types are not loading correctly from Cassandra (cql3 table, 
 Native protocol v3) into Pig using CqlNativeStorage. 
 When using Cassandra version 2.1.0 only empty values are loaded, and for 
 newer versions (2.1.1 and 2.1.2) the following error is received: 
 org.apache.cassandra.serializers.MarshalException: Unexpected extraneous 
 bytes after set value
 at 
 org.apache.cassandra.serializers.SetSerializer.deserializeForNativeProtocol(SetSerializer.java:94)
 Steps to reproduce:
 {code}cqlsh:socialdata CREATE TABLE test (
  key varchar PRIMARY KEY,
  tags setvarchar
);
 cqlsh:socialdata insert into test (key, tags) values ('key', {'Running', 
 'onestep4red', 'running'});
 cqlsh:socialdata select * from test;
  key | tags
 -+---
  key | {'Running', 'onestep4red', 'running'}
 (1 rows){code}
 With version 2.1.0:
 {code}grunt data = load 'cql://socialdata/test' using 
 org.apache.cassandra.hadoop.pig.CqlNativeStorage();
 grunt dump data;
 (key,()){code}
 With version 2.1.2:
 {code}grunt data = load 'cql://socialdata/test' using 
 org.apache.cassandra.hadoop.pig.CqlNativeStorage();
 grunt dump data;
 org.apache.cassandra.serializers.MarshalException: Unexpected extraneous 
 bytes after set value
   at 
 org.apache.cassandra.serializers.SetSerializer.deserializeForNativeProtocol(SetSerializer.java:94)
   at 
 org.apache.cassandra.serializers.SetSerializer.deserializeForNativeProtocol(SetSerializer.java:27)
   at 
 org.apache.cassandra.hadoop.pig.AbstractCassandraStorage.cassandraToObj(AbstractCassandraStorage.java:796)
   at 
 org.apache.cassandra.hadoop.pig.CqlStorage.cqlColumnToObj(CqlStorage.java:195)
   at 
 org.apache.cassandra.hadoop.pig.CqlNativeStorage.getNext(CqlNativeStorage.java:106)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
   at 
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
   at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212){code}
 Expected result:
 {code}(key,(Running,onestep4red,running)){code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-8577) Values of set types not loading correctly into Pig

2015-01-14 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277952#comment-14277952
 ] 

Alex Liu edited comment on CASSANDRA-8577 at 1/15/15 12:09 AM:
---

duplicate of CASSANDRA-8577


was (Author: alexliu68):
duplicate of CASSANDRA-8577

 Values of set types not loading correctly into Pig
 --

 Key: CASSANDRA-8577
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8577
 Project: Cassandra
  Issue Type: Bug
Reporter: Oksana Danylyshyn
Assignee: Artem Aliev
 Fix For: 2.1.3

 Attachments: cassandra-2.1-8577.txt


 Values of set types are not loading correctly from Cassandra (cql3 table, 
 Native protocol v3) into Pig using CqlNativeStorage. 
 When using Cassandra version 2.1.0 only empty values are loaded, and for 
 newer versions (2.1.1 and 2.1.2) the following error is received: 
 org.apache.cassandra.serializers.MarshalException: Unexpected extraneous 
 bytes after set value
 at 
 org.apache.cassandra.serializers.SetSerializer.deserializeForNativeProtocol(SetSerializer.java:94)
 Steps to reproduce:
 {code}cqlsh:socialdata CREATE TABLE test (
  key varchar PRIMARY KEY,
  tags setvarchar
);
 cqlsh:socialdata insert into test (key, tags) values ('key', {'Running', 
 'onestep4red', 'running'});
 cqlsh:socialdata select * from test;
  key | tags
 -+---
  key | {'Running', 'onestep4red', 'running'}
 (1 rows){code}
 With version 2.1.0:
 {code}grunt data = load 'cql://socialdata/test' using 
 org.apache.cassandra.hadoop.pig.CqlNativeStorage();
 grunt dump data;
 (key,()){code}
 With version 2.1.2:
 {code}grunt data = load 'cql://socialdata/test' using 
 org.apache.cassandra.hadoop.pig.CqlNativeStorage();
 grunt dump data;
 org.apache.cassandra.serializers.MarshalException: Unexpected extraneous 
 bytes after set value
   at 
 org.apache.cassandra.serializers.SetSerializer.deserializeForNativeProtocol(SetSerializer.java:94)
   at 
 org.apache.cassandra.serializers.SetSerializer.deserializeForNativeProtocol(SetSerializer.java:27)
   at 
 org.apache.cassandra.hadoop.pig.AbstractCassandraStorage.cassandraToObj(AbstractCassandraStorage.java:796)
   at 
 org.apache.cassandra.hadoop.pig.CqlStorage.cqlColumnToObj(CqlStorage.java:195)
   at 
 org.apache.cassandra.hadoop.pig.CqlNativeStorage.getNext(CqlNativeStorage.java:106)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
   at 
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
   at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212){code}
 Expected result:
 {code}(key,(Running,onestep4red,running)){code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

2015-01-14 Thread Alex Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Liu updated CASSANDRA-8576:

Attachment: 8576-2.1-branch.txt

v1 patch is attached to only support full partition key EQ queries.

 Primary Key Pushdown For Hadoop
 ---

 Key: CASSANDRA-8576
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Reporter: Russell Alexander Spitzer
 Attachments: 8576-2.1-branch.txt


 I've heard reports from several users that they would like to have predicate 
 pushdown functionality for hadoop (Hive in particular) based services. 
 Example usecase
 Table with wide partitions, one per customer
 Application team has HQL they would like to run on a single customer
 Currently time to complete scales with number of customers since Input Format 
 can't pushdown primary key predicate
 Current implementation requires a full table scan (since it can't recognize 
 that a single partition was specified)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8577) Values of set types not loading correctly into Pig

2015-01-14 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277952#comment-14277952
 ] 

Alex Liu commented on CASSANDRA-8577:
-

duplicate of CASSANDRA-8577

 Values of set types not loading correctly into Pig
 --

 Key: CASSANDRA-8577
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8577
 Project: Cassandra
  Issue Type: Bug
Reporter: Oksana Danylyshyn
Assignee: Artem Aliev
 Fix For: 2.1.3

 Attachments: cassandra-2.1-8577.txt


 Values of set types are not loading correctly from Cassandra (cql3 table, 
 Native protocol v3) into Pig using CqlNativeStorage. 
 When using Cassandra version 2.1.0 only empty values are loaded, and for 
 newer versions (2.1.1 and 2.1.2) the following error is received: 
 org.apache.cassandra.serializers.MarshalException: Unexpected extraneous 
 bytes after set value
 at 
 org.apache.cassandra.serializers.SetSerializer.deserializeForNativeProtocol(SetSerializer.java:94)
 Steps to reproduce:
 {code}cqlsh:socialdata CREATE TABLE test (
  key varchar PRIMARY KEY,
  tags setvarchar
);
 cqlsh:socialdata insert into test (key, tags) values ('key', {'Running', 
 'onestep4red', 'running'});
 cqlsh:socialdata select * from test;
  key | tags
 -+---
  key | {'Running', 'onestep4red', 'running'}
 (1 rows){code}
 With version 2.1.0:
 {code}grunt data = load 'cql://socialdata/test' using 
 org.apache.cassandra.hadoop.pig.CqlNativeStorage();
 grunt dump data;
 (key,()){code}
 With version 2.1.2:
 {code}grunt data = load 'cql://socialdata/test' using 
 org.apache.cassandra.hadoop.pig.CqlNativeStorage();
 grunt dump data;
 org.apache.cassandra.serializers.MarshalException: Unexpected extraneous 
 bytes after set value
   at 
 org.apache.cassandra.serializers.SetSerializer.deserializeForNativeProtocol(SetSerializer.java:94)
   at 
 org.apache.cassandra.serializers.SetSerializer.deserializeForNativeProtocol(SetSerializer.java:27)
   at 
 org.apache.cassandra.hadoop.pig.AbstractCassandraStorage.cassandraToObj(AbstractCassandraStorage.java:796)
   at 
 org.apache.cassandra.hadoop.pig.CqlStorage.cqlColumnToObj(CqlStorage.java:195)
   at 
 org.apache.cassandra.hadoop.pig.CqlNativeStorage.getNext(CqlNativeStorage.java:106)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
   at 
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
   at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212){code}
 Expected result:
 {code}(key,(Running,onestep4red,running)){code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

2015-01-13 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276140#comment-14276140
 ] 

Alex Liu commented on CASSANDRA-8576:
-

Should it work only for EQ predicates? Should it also include IN predicates?

 Primary Key Pushdown For Hadoop
 ---

 Key: CASSANDRA-8576
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Reporter: Russell Alexander Spitzer

 I've heard reports from several users that they would like to have predicate 
 pushdown functionality for hadoop (Hive in particular) based services. 
 Example usecase
 Table with wide partitions, one per customer
 Application team has HQL they would like to run on a single customer
 Currently time to complete scales with number of customers since Input Format 
 can't pushdown primary key predicate
 Current implementation requires a full table scan (since it can't recognize 
 that a single partition was specified)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8599) Refactor or fix CqlStorage

2015-01-12 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14274134#comment-14274134
 ] 

Alex Liu commented on CASSANDRA-8599:
-

I am removing CS and refactoring CNS. 

 Refactor or fix CqlStorage
 --

 Key: CASSANDRA-8599
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8599
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Reporter: Brandon Williams
Assignee: Alex Liu
 Fix For: 2.1.3


 In CASSANDRA-8541, for 2.1, I ultimately replace the non-existent CPIF 
 references with CIF, since CNS was broken otherwise.  But this means CS no 
 longer works since it's not a simple drop in replacement (but have CNS work 
 is better than having them both broken by a class that doesn't exist.)  We 
 can't just deprecate and remove CS either, because CNS extends it.  We either 
 need to fix CS to work with CIF, or we need to refactor CNS so that we can 
 just remove CS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-8599) Refactor or fix CqlStorage

2015-01-12 Thread Alex Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Liu updated CASSANDRA-8599:

Attachment: 8166_2.1_branch.txt

v1 patch on 2.1 branch is attached which wrap CqlStorage on CqlNativeStorage to 
retain backward compatibility.

 Refactor or fix CqlStorage
 --

 Key: CASSANDRA-8599
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8599
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Reporter: Brandon Williams
Assignee: Alex Liu
 Fix For: 2.1.3

 Attachments: 8166_2.1_branch.txt


 In CASSANDRA-8541, for 2.1, I ultimately replace the non-existent CPIF 
 references with CIF, since CNS was broken otherwise.  But this means CS no 
 longer works since it's not a simple drop in replacement (but have CNS work 
 is better than having them both broken by a class that doesn't exist.)  We 
 can't just deprecate and remove CS either, because CNS extends it.  We either 
 need to fix CS to work with CIF, or we need to refactor CNS so that we can 
 just remove CS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-8599) Refactor or fix CqlStorage

2015-01-12 Thread Alex Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Liu updated CASSANDRA-8599:

Attachment: 8599-2.1-branch.txt

 Refactor or fix CqlStorage
 --

 Key: CASSANDRA-8599
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8599
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Reporter: Brandon Williams
Assignee: Alex Liu
 Fix For: 2.1.3

 Attachments: 8599-2.1-branch.txt


 In CASSANDRA-8541, for 2.1, I ultimately replace the non-existent CPIF 
 references with CIF, since CNS was broken otherwise.  But this means CS no 
 longer works since it's not a simple drop in replacement (but have CNS work 
 is better than having them both broken by a class that doesn't exist.)  We 
 can't just deprecate and remove CS either, because CNS extends it.  We either 
 need to fix CS to work with CIF, or we need to refactor CNS so that we can 
 just remove CS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-8599) Refactor or fix CqlStorage

2015-01-12 Thread Alex Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Liu updated CASSANDRA-8599:

Attachment: (was: 8166_2.1_branch.txt)

 Refactor or fix CqlStorage
 --

 Key: CASSANDRA-8599
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8599
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Reporter: Brandon Williams
Assignee: Alex Liu
 Fix For: 2.1.3


 In CASSANDRA-8541, for 2.1, I ultimately replace the non-existent CPIF 
 references with CIF, since CNS was broken otherwise.  But this means CS no 
 longer works since it's not a simple drop in replacement (but have CNS work 
 is better than having them both broken by a class that doesn't exist.)  We 
 can't just deprecate and remove CS either, because CNS extends it.  We either 
 need to fix CS to work with CIF, or we need to refactor CNS so that we can 
 just remove CS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-8599) Refactor or fix CqlStorage

2015-01-12 Thread Alex Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Liu updated CASSANDRA-8599:

Attachment: (was: 8599-2.1-branch.txt)

 Refactor or fix CqlStorage
 --

 Key: CASSANDRA-8599
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8599
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Reporter: Brandon Williams
Assignee: Alex Liu
 Fix For: 2.1.3

 Attachments: 8599-2.1-branch.txt


 In CASSANDRA-8541, for 2.1, I ultimately replace the non-existent CPIF 
 references with CIF, since CNS was broken otherwise.  But this means CS no 
 longer works since it's not a simple drop in replacement (but have CNS work 
 is better than having them both broken by a class that doesn't exist.)  We 
 can't just deprecate and remove CS either, because CNS extends it.  We either 
 need to fix CS to work with CIF, or we need to refactor CNS so that we can 
 just remove CS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-8599) Refactor or fix CqlStorage

2015-01-12 Thread Alex Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Liu updated CASSANDRA-8599:

Attachment: 8599-2.1-branch.txt

 Refactor or fix CqlStorage
 --

 Key: CASSANDRA-8599
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8599
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Reporter: Brandon Williams
Assignee: Alex Liu
 Fix For: 2.1.3

 Attachments: 8599-2.1-branch.txt


 In CASSANDRA-8541, for 2.1, I ultimately replace the non-existent CPIF 
 references with CIF, since CNS was broken otherwise.  But this means CS no 
 longer works since it's not a simple drop in replacement (but have CNS work 
 is better than having them both broken by a class that doesn't exist.)  We 
 can't just deprecate and remove CS either, because CNS extends it.  We either 
 need to fix CS to work with CIF, or we need to refactor CNS so that we can 
 just remove CS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-8599) Refactor or fix CqlStorage

2015-01-12 Thread Alex Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Liu updated CASSANDRA-8599:

Attachment: 8599-v2-2.1-branch.txt

 Refactor or fix CqlStorage
 --

 Key: CASSANDRA-8599
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8599
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Reporter: Brandon Williams
Assignee: Alex Liu
 Fix For: 2.1.3

 Attachments: 8599-2.1-branch.txt, 8599-v2-2.1-branch.txt


 In CASSANDRA-8541, for 2.1, I ultimately replace the non-existent CPIF 
 references with CIF, since CNS was broken otherwise.  But this means CS no 
 longer works since it's not a simple drop in replacement (but have CNS work 
 is better than having them both broken by a class that doesn't exist.)  We 
 can't just deprecate and remove CS either, because CNS extends it.  We either 
 need to fix CS to work with CIF, or we need to refactor CNS so that we can 
 just remove CS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8599) Refactor or fix CqlStorage

2015-01-12 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14274370#comment-14274370
 ] 

Alex Liu commented on CASSANDRA-8599:
-

V2 is attached to add deprecation warning message and combine imports

 Refactor or fix CqlStorage
 --

 Key: CASSANDRA-8599
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8599
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Reporter: Brandon Williams
Assignee: Alex Liu
 Fix For: 2.1.3

 Attachments: 8599-2.1-branch.txt, 8599-v2-2.1-branch.txt


 In CASSANDRA-8541, for 2.1, I ultimately replace the non-existent CPIF 
 references with CIF, since CNS was broken otherwise.  But this means CS no 
 longer works since it's not a simple drop in replacement (but have CNS work 
 is better than having them both broken by a class that doesn't exist.)  We 
 can't just deprecate and remove CS either, because CNS extends it.  We either 
 need to fix CS to work with CIF, or we need to refactor CNS so that we can 
 just remove CS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8541) References to non-existent/deprecated CqlPagingInputFormat in code

2015-01-06 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266858#comment-14266858
 ] 

Alex Liu commented on CASSANDRA-8541:
-

Please rollback the change to CqlStorage which supports to use 
CqlPagingInputFormat. CqlNativeStorage uses CqlInputFormat

 References to non-existent/deprecated CqlPagingInputFormat in code
 --

 Key: CASSANDRA-8541
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8541
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Reporter: Rekha Joshi
Assignee: Rekha Joshi
  Labels: hadoop
 Fix For: 2.0.12

 Attachments: CASSANDRA-8541.txt


 On Mac 10.9.5, Java 1.7, latest cassandra trunk -
 References to non-existent/deprecated CqlPagingInputFormat in code.
 As per Changes.txt/7570 both CqlPagingInputFormat and CqlPagingRecordReader 
 are removed, but lingering references in WordCount,CqlStorage..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8486) Can't authenticate using CqlRecordReader

2014-12-15 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247212#comment-14247212
 ] 

Alex Liu commented on CASSANDRA-8486:
-

try set it as {code}
   ConfigHelper.setInputKeyspaceUserNameAndPassword(conf, username, 
password);
   CqlConfigHelper.setInputNativeAuthProvider(conf, 
PlainTextAuthProvider.class.getName());
{code}


 Can't authenticate using CqlRecordReader
 

 Key: CASSANDRA-8486
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8486
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Reporter: Cyril Scetbon
Assignee: Alex Liu

 Using CqlPagingRecordReader, it was possible to use authentification to 
 connect to the cassandra cluster, but now that we only have CqlRecordReader 
 we can't anymore.
 We should put [this 
 code|https://github.com/apache/cassandra/blob/cassandra-2.0.9/src/java/org/apache/cassandra/hadoop/cql3/CqlPagingRecordReader.java#L140-L153]
   back in CqlRecordReader



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7083) Authentication Support for CqlRecordWriter

2014-11-06 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200579#comment-14200579
 ] 

Alex Liu commented on CASSANDRA-7083:
-

+1

 Authentication Support for CqlRecordWriter
 --

 Key: CASSANDRA-7083
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7083
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Reporter: Henning Kropp
Assignee: Brandon Williams
  Labels: authentication, pig
 Fix For: 2.0.12

 Attachments: auth_cql.patch


 The {{CqlRecordWriter}} seems not to support authentication. When the 
 keyspace in Cassandra is to set to use authentication our Pig job fails with, 
 when credentials are provided using the URI ('cql://username:password...):
 {code}
 java.lang.RuntimeException: InvalidRequestException(why:You have not logged 
 in)
   at 
 org.apache.cassandra.hadoop.cql3.CqlRecordWriter.(CqlRecordWriter.java:123)
   at 
 org.apache.cassandra.hadoop.cql3.CqlRecordWriter.(CqlRecordWriter.java:90)
   at 
 org.apache.cassandra.hadoop.cql3.CqlOutputFormat.getRecordWriter(CqlOutputFormat.java:76)
   at 
 org.apache.cassandra.hadoop.cql3.CqlOutputFormat.getRecordWriter(CqlOutputFormat.java:57)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getRecordWriter(PigOutputFormat.java:84)
   at 
 org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:553)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
   at org.apache.hadoop.mapred.Child.main(Child.java:170)
 Caused by: InvalidRequestException(why:You have not logged in)
   at 
 org.apache.cassandra.thrift.Cassandra$execute_cql3_query_result.read(Cassandra.java:38677)
   at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
   at 
 org.apache.cassandra.thrift.Cassandra$Client.recv_execute_cql3_query(Cassandra.java:1597)
   at 
 org.apache.cassandra.thrift.Cassandra$Client.execute_cql3_query(Cassandra.java:1582)
   at 
 org.apache.cassandra.hadoop.cql3.CqlRecordWriter.retrievePartitionKeyValidator(CqlRecordWriter.java:332)
   at 
 org.apache.cassandra.hadoop.cql3.CqlRecordWriter.(CqlRecordWriter.java:108)
   ... 7 more
 {code}
 If not supplied in the URI but as only in the {{JobConf}} the exception is:
 {code}
 Output Location Validation Failed for: 'cql://...' More info to follow:
 InvalidRequestException(why:You have not logged in)
 at org.apache.pig.newplan.logical.rules.InputOutputFileValidator$
 {code}
 Which let to the finding, that authentication is correctly supplied for 
 {{CqlStorage}} but not for the {{CqlRecordWriter}}.
 Maybe it would make sense to put the authentication part into 
 {{ConfigHelper.getClientFromAddressList()}}? Then in {{CqlStorage}} the 
 username and password in the conf would need to be set from the URI. If so 
 the {{ConfigHelper}} has all the information to authenticate and already 
 returns the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8166) Not all data is loaded to Pig using CqlNativeStorage

2014-10-23 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181550#comment-14181550
 ] 

Alex Liu commented on CASSANDRA-8166:
-

yes

 Not all data is loaded to Pig using CqlNativeStorage
 

 Key: CASSANDRA-8166
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8166
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Reporter: Oksana Danylyshyn
Assignee: Alex Liu
 Attachments: 8166_2.1_branch.txt, pig_header, pig_schema, sorted.zip


 Not all the data from Cassandra table is loaded into Pig using 
 CqlNativeStorage function.
 Steps to reproduce:
 cql3 create table statement:
 CREATE TABLE time_bucket_step (
   key varchar,
   object_id varchar,
   value varchar,
   PRIMARY KEY (key, object_id)
 );
 Loading and saving data to Cassandra (sorted file is in the attachment):
  time_bucket_step = load 'sorted' using PigStorage('\t') as (key:chararray, 
 object_id:chararray, value:chararray);
 records = foreach time_bucket_step
   generate
 TOTUPLE(TOTUPLE('key', key),TOTUPLE('object_id', object_id)),
 TOTUPLE(value);
 store records into 
 'cql://socialdata/time_bucket_step?output_query=UPDATE+socialdata.time_bucket_step+set+value+%3D+%3F'
  using org.apache.cassandra.hadoop.pig.CqlNativeStorage();
 Results:
 Input(s):
 Successfully read 139026 records (5817 bytes) from: hdfs://.../sorted
 Output(s):
 Successfully stored 139026 records in: 
 cql://socialdata/time_bucket_step?output_query=UPDATE+socialdata.time_bucket_step+set+value+%3D+%3F
 Loading data from Cassandra: (note that not all data are read)
 time_bucket_step_cass = load 'cql://socialdata/time_bucket_step' using 
 org.apache.cassandra.hadoop.pig.CqlNativeStorage();
 store time_bucket_step_cass into 'time_bucket_step_cass' using 
 PigStorage('\t','-schema');
 Results:
 Input(s):
 Successfully read 80727 records (20068 bytes) from: 
 cql://socialdata/time_bucket_step
 Output(s):
 Successfully stored 80727 records (2098178 bytes) in: 
 hdfs:///time_bucket_step_cass
 Actual: only 80727 of 139026 records were loaded
 Expected: All data should be loaded



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8166) Not all data is loaded to Pig using CqlNativeStorage

2014-10-22 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180459#comment-14180459
 ] 

Alex Liu commented on CASSANDRA-8166:
-

Did you check Cqlsh for result count? select count(*) from 
socialdata.time_bucket_step

 Not all data is loaded to Pig using CqlNativeStorage
 

 Key: CASSANDRA-8166
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8166
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Reporter: Oksana Danylyshyn
Assignee: Alex Liu
 Attachments: sorted.zip


 Not all the data from Cassandra table is loaded into Pig using 
 CqlNativeStorage function.
 Steps to reproduce:
 cql3 create table statement:
 CREATE TABLE time_bucket_step (
   key varchar,
   object_id varchar,
   value varchar,
   PRIMARY KEY (key, object_id)
 );
 Loading and saving data to Cassandra (sorted file is in the attachment):
 time_bucket_step = load 'sorted' using PigStorage('\t','-schema');
 records = foreach time_bucket_step
   generate
 TOTUPLE(TOTUPLE('key', key),TOTUPLE('object_id', object_id)),
 TOTUPLE(value);
 store records into 
 'cql://socialdata/time_bucket_step?output_query=UPDATE+socialdata.time_bucket_step+set+value+%3D+%3F'
  using org.apache.cassandra.hadoop.pig.CqlNativeStorage();
 Results:
 Input(s):
 Successfully read 139026 records (5817 bytes) from: hdfs://.../sorted
 Output(s):
 Successfully stored 139026 records in: 
 cql://socialdata/time_bucket_step?output_query=UPDATE+socialdata.time_bucket_step+set+value+%3D+%3F
 Loading data from Cassandra: (note that not all data are read)
 time_bucket_step_cass = load 'cql://socialdata/time_bucket_step' using 
 org.apache.cassandra.hadoop.pig.CqlNativeStorage();
 store time_bucket_step_cass into 'time_bucket_step_cass' using 
 PigStorage('\t','-schema');
 Results:
 Input(s):
 Successfully read 80727 records (20068 bytes) from: 
 cql://socialdata/time_bucket_step
 Output(s):
 Successfully stored 80727 records (2098178 bytes) in: 
 hdfs:///time_bucket_step_cass
 Actual: only 80727 of 139026 records were loaded
 Expected: All data should be loaded



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

1 2 3 4 5 6 7 >

1 - 100 of 664 matches

Mail list logo