[jira] [Commented] (CASSANDRA-4131) Integrate Hive support to be in core cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13993711#comment-13993711 ] Alex McLintock commented on CASSANDRA-4131: --- I can't access https://github.com/riptano/hive at all 404 error I'd love to know the status of this... Is it true that Cassandra can be used as a data source for Hadoop's Hive bot only if I use the DataStax code version of Cassandra? Integrate Hive support to be in core cassandra -- Key: CASSANDRA-4131 URL: https://issues.apache.org/jira/browse/CASSANDRA-4131 Project: Cassandra Issue Type: Improvement Reporter: Jeremy Hanna Assignee: Edward Capriolo Labels: hadoop, hive The standalone hive support (at https://github.com/riptano/hive) would be great to have in-tree so that people don't have to go out to github to download it and wonder if it's a left-for-dead external shim. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-4131) Integrate Hive support to be in core cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13778823#comment-13778823 ] Marcel commented on CASSANDRA-4131: --- Will there be a update for the cassandra-handler for Cassandra 2.0.0? I have it working on cassandra 2.0.0 version (with a slight problem connecting to the cassandra cluster) but i'm noticing that the number of mappers is equal to the number of vhosts (default 256 per node). I think this should be equal to the number of nodes. The problem with connecting to the cassandra cluster was that the cassandra.host property past in de the create table statement in hive didn't get passed on. When I hardcoded it (replaced localhost with ip address) it worked. Integrate Hive support to be in core cassandra -- Key: CASSANDRA-4131 URL: https://issues.apache.org/jira/browse/CASSANDRA-4131 Project: Cassandra Issue Type: Improvement Reporter: Jeremy Hanna Assignee: Edward Capriolo Labels: hadoop, hive The standalone hive support (at https://github.com/riptano/hive) would be great to have in-tree so that people don't have to go out to github to download it and wonder if it's a left-for-dead external shim. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4131) Integrate Hive support to be in core cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760054#comment-13760054 ] Cyril Scetbon commented on CASSANDRA-4131: -- It's weird that this BUG has a Major Priority since April 2012 and that now there is a DataStax version not included in the trunk ... Integrate Hive support to be in core cassandra -- Key: CASSANDRA-4131 URL: https://issues.apache.org/jira/browse/CASSANDRA-4131 Project: Cassandra Issue Type: Improvement Reporter: Jeremy Hanna Assignee: Edward Capriolo Labels: hadoop, hive The standalone hive support (at https://github.com/riptano/hive) would be great to have in-tree so that people don't have to go out to github to download it and wonder if it's a left-for-dead external shim. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4131) Integrate Hive support to be in core cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13759019#comment-13759019 ] Cyril Scetbon commented on CASSANDRA-4131: -- I tried your README documentation (I was almost doing the same things) and I got the same error http://pastebin.com/KTRPx2Fh. As you can see I got no error with the creation of the column family messages that didn't exist. Integrate Hive support to be in core cassandra -- Key: CASSANDRA-4131 URL: https://issues.apache.org/jira/browse/CASSANDRA-4131 Project: Cassandra Issue Type: Improvement Reporter: Jeremy Hanna Assignee: Edward Capriolo Labels: hadoop, hive The standalone hive support (at https://github.com/riptano/hive) would be great to have in-tree so that people don't have to go out to github to download it and wonder if it's a left-for-dead external shim. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4131) Integrate Hive support to be in core cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757919#comment-13757919 ] Cyril Scetbon commented on CASSANDRA-4131: -- I dig into the code and it comes from the fact that in class org.apache.hadoop.hive.cassandra.input.cql.CqlHiveRecordReader, createKey() function returns a MapWritable object whereas in class org.apache.hadoop.hive.ql.exec.FetchOperator, function getRecordReader() tries to get the key with the following code {code}key = currRecReader.createKey();{code} But key is defined as a WritableComparable and so can't store a MapWritable object returned by CqlHiveRecordReader.createKey() function Tell me if I'm wrong or if you have some patches to apply Integrate Hive support to be in core cassandra -- Key: CASSANDRA-4131 URL: https://issues.apache.org/jira/browse/CASSANDRA-4131 Project: Cassandra Issue Type: Improvement Reporter: Jeremy Hanna Assignee: Edward Capriolo Labels: hadoop, hive The standalone hive support (at https://github.com/riptano/hive) would be great to have in-tree so that people don't have to go out to github to download it and wonder if it's a left-for-dead external shim. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4131) Integrate Hive support to be in core cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13758761#comment-13758761 ] Rohit Rai commented on CASSANDRA-4131: -- Sorry, haven't been able to give this my full attention in past month. One of our developers is working on it... and with the latest merge, most of the functionality should be working. You can try with this document, https://github.com/milliondreams/hive/blob/cas-support-cql/cassandra-handler/README Cyril, In the meanwhile when I look at the code, I also notice what you are saying, we will try to figure it out. Give it a try with the README and let us know how that goes. Integrate Hive support to be in core cassandra -- Key: CASSANDRA-4131 URL: https://issues.apache.org/jira/browse/CASSANDRA-4131 Project: Cassandra Issue Type: Improvement Reporter: Jeremy Hanna Assignee: Edward Capriolo Labels: hadoop, hive The standalone hive support (at https://github.com/riptano/hive) would be great to have in-tree so that people don't have to go out to github to download it and wonder if it's a left-for-dead external shim. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4131) Integrate Hive support to be in core cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13749442#comment-13749442 ] Cyril Scetbon commented on CASSANDRA-4131: -- Okay I'm using your code but I was using CassandraStorageHandler instead of CqlStorageHandler in the Hive DDL command :( Is there a documentation somewhere which describes how to use your driver ? I don't have any issue when I use CassandraStorageHandler with a cql2 table, but when I use CqlStorageHandler with a cql3 table I get : java.io.IOException: java.lang.ClassCastException: org.apache.hadoop.io.MapWritable cannot be cast to org.apache.hadoop.io.WritableComparable at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:521) I use Hadoop 1.2.1 as Hadoop 2 is not supported. Is it related to my Hadoop version ? Integrate Hive support to be in core cassandra -- Key: CASSANDRA-4131 URL: https://issues.apache.org/jira/browse/CASSANDRA-4131 Project: Cassandra Issue Type: Improvement Reporter: Jeremy Hanna Assignee: Edward Capriolo Labels: hadoop, hive The standalone hive support (at https://github.com/riptano/hive) would be great to have in-tree so that people don't have to go out to github to download it and wonder if it's a left-for-dead external shim. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4131) Integrate Hive support to be in core cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13748762#comment-13748762 ] Rohit Rai commented on CASSANDRA-4131: -- What do you mean by not able to see external table? We are infact using a CQL query to get CF names, select columnfamily_name from system.schema_columnfamilies where keyspace_name='%s'; Are you looking at the cql branch? https://github.com/milliondreams/hive/blob/cas-support-cql Integrate Hive support to be in core cassandra -- Key: CASSANDRA-4131 URL: https://issues.apache.org/jira/browse/CASSANDRA-4131 Project: Cassandra Issue Type: Improvement Reporter: Jeremy Hanna Assignee: Edward Capriolo Labels: hadoop, hive The standalone hive support (at https://github.com/riptano/hive) would be great to have in-tree so that people don't have to go out to github to download it and wonder if it's a left-for-dead external shim. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4131) Integrate Hive support to be in core cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13748767#comment-13748767 ] Rohit Rai commented on CASSANDRA-4131: -- On another note, now we have completed Select and Insert working. We also have support for Create table when one doesn't exist in C*. I noticed in this blog, http://www.planetcassandra.org/blog/post/support-cql3-tables-in-hadoop-pig-and-hive Is this only in DSE? Does Datastax plan to release it? Integrate Hive support to be in core cassandra -- Key: CASSANDRA-4131 URL: https://issues.apache.org/jira/browse/CASSANDRA-4131 Project: Cassandra Issue Type: Improvement Reporter: Jeremy Hanna Assignee: Edward Capriolo Labels: hadoop, hive The standalone hive support (at https://github.com/riptano/hive) would be great to have in-tree so that people don't have to go out to github to download it and wonder if it's a left-for-dead external shim. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4131) Integrate Hive support to be in core cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13745060#comment-13745060 ] Cyril Scetbon commented on CASSANDRA-4131: -- The tests I made show that CQL3 tables are not seen when I try to create the external table in Hive. This is due to the fact that Thrift does not return CQL3 tables and that you use it (through describe_keyspace) to get column families definitions. Integrate Hive support to be in core cassandra -- Key: CASSANDRA-4131 URL: https://issues.apache.org/jira/browse/CASSANDRA-4131 Project: Cassandra Issue Type: Improvement Reporter: Jeremy Hanna Assignee: Edward Capriolo Labels: hadoop, hive The standalone hive support (at https://github.com/riptano/hive) would be great to have in-tree so that people don't have to go out to github to download it and wonder if it's a left-for-dead external shim. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4131) Integrate Hive support to be in core cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13743937#comment-13743937 ] Cyril Scetbon commented on CASSANDRA-4131: -- For the performance issue I suppose it's simply inherent to Hadoop internals which are designed for lot of data and not for a few rows. Integrate Hive support to be in core cassandra -- Key: CASSANDRA-4131 URL: https://issues.apache.org/jira/browse/CASSANDRA-4131 Project: Cassandra Issue Type: Improvement Reporter: Jeremy Hanna Assignee: Edward Capriolo Labels: hadoop, hive The standalone hive support (at https://github.com/riptano/hive) would be great to have in-tree so that people don't have to go out to github to download it and wonder if it's a left-for-dead external shim. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4131) Integrate Hive support to be in core cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739862#comment-13739862 ] Cyril Scetbon commented on CASSANDRA-4131: -- I've added 2 commits available at https://github.com/cscetbon/hive which : - set default partitioner to Murmur3 - skip deleted rows when reading from Cassandra I also sent you a pull request on Github. Integrate Hive support to be in core cassandra -- Key: CASSANDRA-4131 URL: https://issues.apache.org/jira/browse/CASSANDRA-4131 Project: Cassandra Issue Type: Improvement Reporter: Jeremy Hanna Assignee: Edward Capriolo Labels: hadoop, hive The standalone hive support (at https://github.com/riptano/hive) would be great to have in-tree so that people don't have to go out to github to download it and wonder if it's a left-for-dead external shim. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4131) Integrate Hive support to be in core cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738387#comment-13738387 ] Cyril Scetbon commented on CASSANDRA-4131: -- I've met a performance issue where there is a few data. In my example, I have only a few rows : {code}cqlshselect count(*) from light_column; count --- 4 {code} It takes less than a second with cqlsh whereas it takes near 600 seconds with Hive. Please see logs at http://pastebin.com/ippy96GY There are 257 mappers (to scan data from 256 vnodes) and they took a lot of CPU even if the process says at the end : *Total MapReduce CPU Time Spent: 0 msec* Another issue is that the count number is false as it returns 5 instead of 4. Integrate Hive support to be in core cassandra -- Key: CASSANDRA-4131 URL: https://issues.apache.org/jira/browse/CASSANDRA-4131 Project: Cassandra Issue Type: Improvement Reporter: Jeremy Hanna Assignee: Edward Capriolo Labels: hadoop, hive The standalone hive support (at https://github.com/riptano/hive) would be great to have in-tree so that people don't have to go out to github to download it and wonder if it's a left-for-dead external shim. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4131) Integrate Hive support to be in core cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13733490#comment-13733490 ] Cyril Scetbon commented on CASSANDRA-4131: -- Did you have more time to test it ? Is there any documentation that describes how to use it ? Integrate Hive support to be in core cassandra -- Key: CASSANDRA-4131 URL: https://issues.apache.org/jira/browse/CASSANDRA-4131 Project: Cassandra Issue Type: Improvement Reporter: Jeremy Hanna Assignee: Edward Capriolo Labels: hadoop, hive The standalone hive support (at https://github.com/riptano/hive) would be great to have in-tree so that people don't have to go out to github to download it and wonder if it's a left-for-dead external shim. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4131) Integrate Hive support to be in core cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13699805#comment-13699805 ] Rohit Rai commented on CASSANDRA-4131: -- Sorry for the mess there, I was just trying to port CFS and Hive metastore too... but those tests don't work right now, so put it on hold, getting it to work with CQL3 Column Families is a priority for me right now, so will come back to those later. Just for the Hive handler, please look at the (cas-support-simple-hive) branch - https://github.com/milliondreams/hive/tree/cas-support-simple-hive All the test cases (whatever few they had) pass there and it is working perfectly with Thrift/Compact storage Column Families. Integrate Hive support to be in core cassandra -- Key: CASSANDRA-4131 URL: https://issues.apache.org/jira/browse/CASSANDRA-4131 Project: Cassandra Issue Type: Improvement Reporter: Jeremy Hanna Assignee: Edward Capriolo Labels: hadoop, hive The standalone hive support (at https://github.com/riptano/hive) would be great to have in-tree so that people don't have to go out to github to download it and wonder if it's a left-for-dead external shim. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4131) Integrate Hive support to be in core cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13699835#comment-13699835 ] Cyril Scetbon commented on CASSANDRA-4131: -- bq. getting it to work with CQL3 Column Families is a priority for me right now Okay, that's exactly the feature I'm waiting for :) You should find inspiration in [CASSANDRA-5234|https://issues.apache.org/jira/browse/CASSANDRA-5234] like paging Integrate Hive support to be in core cassandra -- Key: CASSANDRA-4131 URL: https://issues.apache.org/jira/browse/CASSANDRA-4131 Project: Cassandra Issue Type: Improvement Reporter: Jeremy Hanna Assignee: Edward Capriolo Labels: hadoop, hive The standalone hive support (at https://github.com/riptano/hive) would be great to have in-tree so that people don't have to go out to github to download it and wonder if it's a left-for-dead external shim. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4131) Integrate Hive support to be in core cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13700294#comment-13700294 ] Rohit Rai commented on CASSANDRA-4131: -- Actually, Hive support internally uses the Cassandra Hadoop Input format... and thankfully we now have CqlPagingInputFormat support in 1.2.6. So I have got the basic CQL3 column family support(reading) in, and it is working. Haven't done extensive testing and need to write some test cases... But I could run it with CQL Column Families with Simple as well as Composite primary keys. The code is here if you want to give it a try. https://github.com/milliondreams/hive/tree/cas-support-cql Integrate Hive support to be in core cassandra -- Key: CASSANDRA-4131 URL: https://issues.apache.org/jira/browse/CASSANDRA-4131 Project: Cassandra Issue Type: Improvement Reporter: Jeremy Hanna Assignee: Edward Capriolo Labels: hadoop, hive The standalone hive support (at https://github.com/riptano/hive) would be great to have in-tree so that people don't have to go out to github to download it and wonder if it's a left-for-dead external shim. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4131) Integrate Hive support to be in core cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13698686#comment-13698686 ] Rohit Rai commented on CASSANDRA-4131: -- So I figured it out... This is being caused by the difference in Partitioners. I am using Murmur3 Partitioner on my CF, but since I didn't specify it during Table creating in Hive, the Hive Input Format defaults to using Random partitioner! My bad, I didn't notice it while testing in Map/Reduce as I set the Partition there. I will be testing it extensively over next few weeks, including one deployment in production ;) Will update if there are any more issues. Is there any interest in getting this in Cassandra? If yes, I can make a patch and submit. On second thought, shouldn't it default to Murmur3, since it is the default partitioner now? Integrate Hive support to be in core cassandra -- Key: CASSANDRA-4131 URL: https://issues.apache.org/jira/browse/CASSANDRA-4131 Project: Cassandra Issue Type: Improvement Reporter: Jeremy Hanna Assignee: Edward Capriolo Labels: hadoop, hive The standalone hive support (at https://github.com/riptano/hive) would be great to have in-tree so that people don't have to go out to github to download it and wonder if it's a left-for-dead external shim. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4131) Integrate Hive support to be in core cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13698712#comment-13698712 ] Cyril Scetbon commented on CASSANDRA-4131: -- Great ! I think it should default to Murmur3 as it's built for 1.2.6 Integrate Hive support to be in core cassandra -- Key: CASSANDRA-4131 URL: https://issues.apache.org/jira/browse/CASSANDRA-4131 Project: Cassandra Issue Type: Improvement Reporter: Jeremy Hanna Assignee: Edward Capriolo Labels: hadoop, hive The standalone hive support (at https://github.com/riptano/hive) would be great to have in-tree so that people don't have to go out to github to download it and wonder if it's a left-for-dead external shim. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4131) Integrate Hive support to be in core cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13698911#comment-13698911 ] Nicolas Lalevée commented on CASSANDRA-4131: FYI, I have used the Hive/Cassandra storage patch for some time now, only for writes, and I had to modify some lines to make it work properly. I never took time to figure out if it was because of my environment or if this is a real bug. Maybe you should look into it. It is about the precision of the timestamp. In the patches, the timestamp is set to {{System.currentTimeMillis()}}. And as far as I understand, the command line client of cassandra precision is in micro seconds. So if a write happen once in the command line client, every writes from Hive will be ignore. For instance, here are lines which I patched: https://github.com/milliondreams/hive/blob/cas-support/cassandra-handler/src/main/java/org/apache/hadoop/hive/cassandra/serde/RegularTableMapping.java#L84 https://github.com/milliondreams/hive/blob/cas-support/cassandra-handler/src/main/java/org/apache/hadoop/hive/cassandra/serde/RegularTableMapping.java#L94 https://github.com/milliondreams/hive/blob/cas-support/cassandra-handler/src/main/java/org/apache/hadoop/hive/cassandra/serde/TransposedMapping.java#L45 https://github.com/milliondreams/hive/blob/cas-support/cassandra-handler/src/main/java/org/apache/hadoop/hive/cassandra/serde/TransposedMapping.java#L63 Instead I used: {{cc.setTimeStamp(FBUtilities.timestampMicros());}} Integrate Hive support to be in core cassandra -- Key: CASSANDRA-4131 URL: https://issues.apache.org/jira/browse/CASSANDRA-4131 Project: Cassandra Issue Type: Improvement Reporter: Jeremy Hanna Assignee: Edward Capriolo Labels: hadoop, hive The standalone hive support (at https://github.com/riptano/hive) would be great to have in-tree so that people don't have to go out to github to download it and wonder if it's a left-for-dead external shim. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4131) Integrate Hive support to be in core cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13699082#comment-13699082 ] Cyril Scetbon commented on CASSANDRA-4131: -- Do all tests run without issues ? I can make them run without errors (java 6 or 7) : Tests in error: CassandraFileSystemTest.testFileSystemWithoutFlush:63-testFileSystem:74 ? IO ... CassandraFileSystemTest.testFileSystemWithFlush:68-testFileSystem:74 ? IO org... CassandraHiveMetaStoreTest.testSetConf:32 ? CassandraHiveMetaStore There was a... CassandraHiveMetaStoreTest.testCreateDeleteDatabaseAndTable:52 ? CassandraHiveMetaStore CassandraHiveMetaStoreTest.testFindEmptyPatitionList:78 ? CassandraHiveMetaStore CassandraHiveMetaStoreTest.testAlterTable:99 ? CassandraHiveMetaStore There wa... CassandraHiveMetaStoreTest.testAlterDatabaseTable:122 ? CassandraHiveMetaStore CassandraHiveMetaStoreTest.testAddParition:144 ? CassandraHiveMetaStore There ... CassandraHiveMetaStoreTest.testCreateMultipleDatabases:174 ? CassandraHiveMetaStore CassandraHiveMetaStoreTest.testAddDropReAddDatabase:186 ? CassandraHiveMetaStore CassandraHiveMetaStoreTest.testCaseInsensitiveNaming:207 ? CassandraHiveMetaStore CassandraHiveMetaStoreTest.testAutoCreateFromKeyspace:229 ? TTransport java.ne... MetaStorePersisterTest.testBasicPersistMetaStoreEntity:52-setupClient:44 ? CassandraHiveMetaStore MetaStorePersisterTest.testEntityNotFound ? Unexpected exception, expectedor... MetaStorePersisterTest.testBasicLoadMetaStoreEntity:73-setupClient:44 ? CassandraHiveMetaStore MetaStorePersisterTest.testFindMetaStoreEntities:89-setupClient:44 ? CassandraHiveMetaStore MetaStorePersisterTest.testEntityDeletion:116-setupClient:44 ? CassandraHiveMetaStore SchemaManagerServiceTest.setupLocal:45 ? CassandraHiveMetaStore There was a pr... SchemaManagerServiceTest.setupLocal:45 ? CassandraHiveMetaStore There was a pr... SchemaManagerServiceTest.setupLocal:45 ? CassandraHiveMetaStore There was a pr... SchemaManagerServiceTest.setupLocal:45 ? CassandraHiveMetaStore There was a pr... SchemaManagerServiceTest.setupLocal:45 ? CassandraHiveMetaStore There was a pr... SchemaManagerServiceTest.setupLocal:45 ? CassandraHiveMetaStore There was a pr... Tests run: 37, Failures: 0, Errors: 23, Skipped: 0 Integrate Hive support to be in core cassandra -- Key: CASSANDRA-4131 URL: https://issues.apache.org/jira/browse/CASSANDRA-4131 Project: Cassandra Issue Type: Improvement Reporter: Jeremy Hanna Assignee: Edward Capriolo Labels: hadoop, hive The standalone hive support (at https://github.com/riptano/hive) would be great to have in-tree so that people don't have to go out to github to download it and wonder if it's a left-for-dead external shim. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4131) Integrate Hive support to be in core cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13698159#comment-13698159 ] Rohit Rai commented on CASSANDRA-4131: -- So, I did get sometime today to debug this... They are not tombstones, as they are present even for that have never been edited. EVERY row is repeated. I tried using the CFIF and HiveCassandraStandardColumnInputFormat directly in a Map/Reduce program, but they didn't give he duplicates. So it must be something in the CassandraStorageHandler. Will look more into it later. Integrate Hive support to be in core cassandra -- Key: CASSANDRA-4131 URL: https://issues.apache.org/jira/browse/CASSANDRA-4131 Project: Cassandra Issue Type: Improvement Reporter: Jeremy Hanna Assignee: Edward Capriolo Labels: hadoop, hive The standalone hive support (at https://github.com/riptano/hive) would be great to have in-tree so that people don't have to go out to github to download it and wonder if it's a left-for-dead external shim. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4131) Integrate Hive support to be in core cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13696646#comment-13696646 ] Cyril Scetbon commented on CASSANDRA-4131: -- Are the duplicates just tombstones not filtered as said at https://issues.apache.org/jira/browse/CASSANDRA-4421?focusedCommentId=13658450page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13658450 ? Integrate Hive support to be in core cassandra -- Key: CASSANDRA-4131 URL: https://issues.apache.org/jira/browse/CASSANDRA-4131 Project: Cassandra Issue Type: Improvement Reporter: Jeremy Hanna Assignee: Edward Capriolo Labels: hadoop, hive The standalone hive support (at https://github.com/riptano/hive) would be great to have in-tree so that people don't have to go out to github to download it and wonder if it's a left-for-dead external shim. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4131) Integrate Hive support to be in core cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13696419#comment-13696419 ] Rohit Rai commented on CASSANDRA-4131: -- Hey, I managed to get the hive cassandra handler to compile against 1.2.6 and all the test cases from Datastax hive repo for it are passing... The code is here - https://github.com/milliondreams/hive/tree/cas-support/cassandra-handler I am facing the same issue that Oliver and Cyril mention about the same row appearing twice in a mapped column. Will start debugging that tomorrow, but will be great if someone can point me in the right direction. Integrate Hive support to be in core cassandra -- Key: CASSANDRA-4131 URL: https://issues.apache.org/jira/browse/CASSANDRA-4131 Project: Cassandra Issue Type: Improvement Reporter: Jeremy Hanna Assignee: Edward Capriolo Labels: hadoop, hive The standalone hive support (at https://github.com/riptano/hive) would be great to have in-tree so that people don't have to go out to github to download it and wonder if it's a left-for-dead external shim. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4131) Integrate Hive support to be in core cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13669084#comment-13669084 ] Cyril Scetbon commented on CASSANDRA-4131: -- Any news about this last issue with duplicate rows ? Integrate Hive support to be in core cassandra -- Key: CASSANDRA-4131 URL: https://issues.apache.org/jira/browse/CASSANDRA-4131 Project: Cassandra Issue Type: Improvement Reporter: Jeremy Hanna Assignee: Edward Capriolo Labels: hadoop, hive The standalone hive support (at https://github.com/riptano/hive) would be great to have in-tree so that people don't have to go out to github to download it and wonder if it's a left-for-dead external shim. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4131) Integrate Hive support to be in core cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649749#comment-13649749 ] Oliver Zhou commented on CASSANDRA-4131: Hi Dmitry, I try your build with cassandra 1.2.3/hive 0.9.0, I have a issue that I always get the duplicated records in Hive. Cassandra column family: CREATE COLUMN FAMILY users WITH comparator = UTF8Type AND key_validation_class=UTF8Type AND column_metadata = [ {column_name: full_name, validation_class: UTF8Type} {column_name: email, validation_class: UTF8Type} {column_name: state, validation_class: UTF8Type} {column_name: gender, validation_class: UTF8Type} {column_name: birth_year, validation_class: LongType} ]; Hive Table: CREATE EXTERNAL TABLE IF NOT EXISTS users (key string, full_name string) STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler' WITH SERDEPROPERTIES (cassandra.columns.mapping = :key,users:full_name , cassandra.cf.name = users) TBLPROPERTIES (cassandra.ks.name = ks33); Hive Query: select * from users; always return duplicated rows (one row appears twice) select count(1) from users; return 2 but exactly I only insert one row. Do you have any idea why this happen? Integrate Hive support to be in core cassandra -- Key: CASSANDRA-4131 URL: https://issues.apache.org/jira/browse/CASSANDRA-4131 Project: Cassandra Issue Type: Improvement Reporter: Jeremy Hanna Assignee: Edward Capriolo Labels: hadoop, hive The standalone hive support (at https://github.com/riptano/hive) would be great to have in-tree so that people don't have to go out to github to download it and wonder if it's a left-for-dead external shim. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4131) Integrate Hive support to be in core cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13632206#comment-13632206 ] Dmitry Vasilenko commented on CASSANDRA-4131: - This can be of some interest: https://github.com/dvasilen/Hive-Cassandra/blob/HIVE-0.10.0-CASSANDRA-1.2.4/release/hive-0.10.0-cassandra-1.2.4.jar https://github.com/dvasilen/Hive-Cassandra/blob/HIVE-0.9.0-CASSANDRA-1.2.4/release/hive-0.9.0-cassandra-1.2.4.jar I was testing Cassandra 1.2.3/Hive 0.10.0/HCatalog 0.5.0 and had to recompile the code of the storage handler to make it work with the latest versions. Integrate Hive support to be in core cassandra -- Key: CASSANDRA-4131 URL: https://issues.apache.org/jira/browse/CASSANDRA-4131 Project: Cassandra Issue Type: Improvement Reporter: Jeremy Hanna Assignee: Edward Capriolo Labels: hadoop, hive The standalone hive support (at https://github.com/riptano/hive) would be great to have in-tree so that people don't have to go out to github to download it and wonder if it's a left-for-dead external shim. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4131) Integrate Hive support to be in core cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13632214#comment-13632214 ] Jonathan Ellis commented on CASSANDRA-4131: --- Is that from Jake's branch? I'm kind of surprised if you didn't need more than a recompile. Integrate Hive support to be in core cassandra -- Key: CASSANDRA-4131 URL: https://issues.apache.org/jira/browse/CASSANDRA-4131 Project: Cassandra Issue Type: Improvement Reporter: Jeremy Hanna Assignee: Edward Capriolo Labels: hadoop, hive The standalone hive support (at https://github.com/riptano/hive) would be great to have in-tree so that people don't have to go out to github to download it and wonder if it's a left-for-dead external shim. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4131) Integrate Hive support to be in core cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13632372#comment-13632372 ] Dmitry Vasilenko commented on CASSANDRA-4131: - I had to refactor the code slightly to conform to the new APIs but other than that it was relatively straightforward. Integrate Hive support to be in core cassandra -- Key: CASSANDRA-4131 URL: https://issues.apache.org/jira/browse/CASSANDRA-4131 Project: Cassandra Issue Type: Improvement Reporter: Jeremy Hanna Assignee: Edward Capriolo Labels: hadoop, hive The standalone hive support (at https://github.com/riptano/hive) would be great to have in-tree so that people don't have to go out to github to download it and wonder if it's a left-for-dead external shim. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4131) Integrate Hive support to be in core cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13591956#comment-13591956 ] Christian Moen commented on CASSANDRA-4131: --- I'm also curious to know what the plans are here. Thanks for any info. Integrate Hive support to be in core cassandra -- Key: CASSANDRA-4131 URL: https://issues.apache.org/jira/browse/CASSANDRA-4131 Project: Cassandra Issue Type: Improvement Reporter: Jeremy Hanna Assignee: Edward Capriolo Labels: hadoop, hive The standalone hive support (at https://github.com/riptano/hive) would be great to have in-tree so that people don't have to go out to github to download it and wonder if it's a left-for-dead external shim. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4131) Integrate Hive support to be in core cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13583263#comment-13583263 ] Chris Romary commented on CASSANDRA-4131: - Wondering about the status of Cassandra/Hive integration... is it a 'left-for-dead external shim' or something that's still actively being worked on? None of the githubs mentioned above have commits in the last 10 months or so. What is the status of Hive/Cassandra 1.2? Integrate Hive support to be in core cassandra -- Key: CASSANDRA-4131 URL: https://issues.apache.org/jira/browse/CASSANDRA-4131 Project: Cassandra Issue Type: Improvement Reporter: Jeremy Hanna Assignee: Edward Capriolo Labels: hadoop, hive The standalone hive support (at https://github.com/riptano/hive) would be great to have in-tree so that people don't have to go out to github to download it and wonder if it's a left-for-dead external shim. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4131) Integrate Hive support to be in core cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13249893#comment-13249893 ] Edward Capriolo commented on CASSANDRA-4131: Yes. We need to get this in tree somewhere, hive, cassandra. It really kills our evolution and makes it hard to manage projects such as: https://github.com/edwardcapriolo/hive_cassandra_udfs. I think we should put this code in Cassandra's tree. Use Hive From Maven to build, and I have a test kit https://github.com/edwardcapriolo/hive_test. That we can bring up an embedded hive for integration testing. What version of Cassandra and Hive should we target? I will assign to myself for now because I am very interested in seeing this happen. Integrate Hive support to be in core cassandra -- Key: CASSANDRA-4131 URL: https://issues.apache.org/jira/browse/CASSANDRA-4131 Project: Cassandra Issue Type: Improvement Reporter: Jeremy Hanna Labels: hadoop, hive The standalone hive support (at https://github.com/riptano/hive) would be great to have in-tree so that people don't have to go out to github to download it and wonder if it's a left-for-dead external shim. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4131) Integrate Hive support to be in core cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13249899#comment-13249899 ] T Jake Luciani commented on CASSANDRA-4131: --- I think most of the work will be making a stand along build.xml to fetch the hive maven artifacts and create the cassandra-handler.jar, I think we just drop the hive test suite and integrate our own. Integrate Hive support to be in core cassandra -- Key: CASSANDRA-4131 URL: https://issues.apache.org/jira/browse/CASSANDRA-4131 Project: Cassandra Issue Type: Improvement Reporter: Jeremy Hanna Assignee: Edward Capriolo Labels: hadoop, hive The standalone hive support (at https://github.com/riptano/hive) would be great to have in-tree so that people don't have to go out to github to download it and wonder if it's a left-for-dead external shim. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4131) Integrate Hive support to be in core cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13249945#comment-13249945 ] Edward Capriolo commented on CASSANDRA-4131: Agreed. What versions of Hive and Cassandra should we target. Hive 0.8.0 and Cassandra 1.1.0? Where exactly do the latest Hive handler sources live? Integrate Hive support to be in core cassandra -- Key: CASSANDRA-4131 URL: https://issues.apache.org/jira/browse/CASSANDRA-4131 Project: Cassandra Issue Type: Improvement Reporter: Jeremy Hanna Assignee: Edward Capriolo Labels: hadoop, hive The standalone hive support (at https://github.com/riptano/hive) would be great to have in-tree so that people don't have to go out to github to download it and wonder if it's a left-for-dead external shim. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4131) Integrate Hive support to be in core cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13249961#comment-13249961 ] T Jake Luciani commented on CASSANDRA-4131: --- The latest code is https://github.com/riptano/hive/tree/hive-0.8.1-merge The cassandra version should be trunk (1.1) since it uses same version thrift as hive 0.7.0 The only thing I want todo it put the CassandraProxyClient code into the main Cassandra tree and use that for hadoop calls since it's much more reliable for us. The hive driver currently depends on it's own version of that class. Integrate Hive support to be in core cassandra -- Key: CASSANDRA-4131 URL: https://issues.apache.org/jira/browse/CASSANDRA-4131 Project: Cassandra Issue Type: Improvement Reporter: Jeremy Hanna Assignee: Edward Capriolo Labels: hadoop, hive The standalone hive support (at https://github.com/riptano/hive) would be great to have in-tree so that people don't have to go out to github to download it and wonder if it's a left-for-dead external shim. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira