[jira] [Commented] (PHOENIX-1056) A ImportTsv tool for phoenix to build table data and all index data.
[ https://issues.apache.org/jira/browse/PHOENIX-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14055974#comment-14055974 ] James Taylor commented on PHOENIX-1056: --- Thanks, [~jaywong]. That's a good improvement to build both the table data and the index data in a single job. Open issues are: - Do we need both a CSV bulk loader and an ImportTsv tool? How are they different? Or can the improvements you made be folded into the CSV bulk loader instead? If we do need both, can the ImportTsv tool be built on top of the CSV bulk loader? - The CSV bulk loader uses publicly exposed Phoenix APIs to get at the underlying KeyValues and uses the Phoenix table metadata to drive the import, while the ImportTSV tool requires the column information to be passed through in a somewhat awkward manner (leaving room for discrepancies between the real schema and the one passed in). The ImportTSV should go through the same Phoenix APIs as the CSV bulk loader IMO. Thoughts? Would be interested in your opinions, [~gabriel.reid] and [~maghamravikiran] A ImportTsv tool for phoenix to build table data and all index data. Key: PHOENIX-1056 URL: https://issues.apache.org/jira/browse/PHOENIX-1056 Project: Phoenix Issue Type: Task Affects Versions: 3.0.0 Reporter: jay wong Fix For: 3.1 Attachments: PHOENIX-1056.patch I have just build a tool for build table data and index table data just like ImportTsv job. http://hbase.apache.org/book/ops_mgt.html#importtsv when ImportTsv work it write HFile in a CF name path. for example A table has two cf, A and B. the output is ./outputpath/A ./outputpath/B In my job. we has a table. TableOne. and two Index IdxOne, IdxTwo. the output will be ./outputpath/TableOne/A ./outputpath/TableOne/B ./outputpath/IdxOne ./outputpath/IdxTwo. If anyone need it .I will build a clean tool. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Reopened] (PHOENIX-1067) Add documentation for ANY/ALL support with arrays
[ https://issues.apache.org/jira/browse/PHOENIX-1067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Taylor reopened PHOENIX-1067: --- Thanks for the additional documentation - it looks good, [~ram_krish]. In your local svn repo, you need to run the build.sh script. It will generate the html based on your md file. Then you need to check in the html file and the site will be updated with your changes. Add documentation for ANY/ALL support with arrays - Key: PHOENIX-1067 URL: https://issues.apache.org/jira/browse/PHOENIX-1067 Project: Phoenix Issue Type: Improvement Reporter: James Taylor Assignee: ramkrishna.s.vasudevan Priority: Minor Attachments: Phoenix-1067.patch Please add some documentation and a few examples for the new ANY/ALL support for arrays here: http://phoenix.apache.org/array_type.html Our website lives in svn - just checkout https://svn.apache.org/repos/asf/phoenix and find the relavent source file (typically a markdown file with a .md extension), modify it, build the website using the build.sh script and check in the modified files. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PHOENIX-1067) Add documentation for ANY/ALL support with arrays
[ https://issues.apache.org/jira/browse/PHOENIX-1067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056169#comment-14056169 ] ramkrishna.s.vasudevan commented on PHOENIX-1067: - Done. And now the site is updated. Thanks James. Add documentation for ANY/ALL support with arrays - Key: PHOENIX-1067 URL: https://issues.apache.org/jira/browse/PHOENIX-1067 Project: Phoenix Issue Type: Improvement Reporter: James Taylor Assignee: ramkrishna.s.vasudevan Priority: Minor Attachments: Phoenix-1067.patch Please add some documentation and a few examples for the new ANY/ALL support for arrays here: http://phoenix.apache.org/array_type.html Our website lives in svn - just checkout https://svn.apache.org/repos/asf/phoenix and find the relavent source file (typically a markdown file with a .md extension), modify it, build the website using the build.sh script and check in the modified files. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PHOENIX-1016) Support MINVALUE, MAXVALUE, and CYCLE options in CREATE SEQUENCE
[ https://issues.apache.org/jira/browse/PHOENIX-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056217#comment-14056217 ] James Taylor commented on PHOENIX-1016: --- Thanks for the patch, [~tdsilva]. It's looking very good. We need to maintain backward compatibility, though. Here are some ideas: How about making the following two changes to maintain backward compatibility for startWith? 1. Store minValue and maxValue as null in CreateSequenceStatement if they're not specified. {code} +protected CreateSequenceStatement(TableName sequenceName, +ParseNode startsWith, ParseNode incrementBy, ParseNode cacheSize, +ParseNode minValue, ParseNode maxValue, boolean cycle, +boolean ifNotExists, int bindCount) { +this.sequenceName = sequenceName; +this.minValue = minValue == null ? new LiteralParseNode(Long.MIN_VALUE) : minValue; +this.maxValue = maxValue == null ? new LiteralParseNode(Long.MAX_VALUE) : maxValue; {code} 2. Initialize startsWithValue in CreateSequenceCompiler to 1 if minValueNode is null and incrementBy is positive or maxValueNode is null and incrementBy is negative? {code} +if (startsWithNode == null) { +startsWithValue = incrementBy 0 ? minValue : maxValue; +} else { {code} Can you tell me more about the other change (i.e. why you need to store the current value instead of the next value)? This one may be ok, as wouldn't existing sequences just skip a sequence value (which is fine)? Or what would happen? If they'd return the same value again, that'd be a problem. If we had to, we could have a conversion script to increment all sequences, but that'd be a bit of a pain so I'd like to avoid it if possible. Support MINVALUE, MAXVALUE, and CYCLE options in CREATE SEQUENCE Key: PHOENIX-1016 URL: https://issues.apache.org/jira/browse/PHOENIX-1016 Project: Phoenix Issue Type: Bug Reporter: James Taylor Assignee: Thomas D'Silva Attachments: PHOENIX-1016.3.0.patch, PHOENIX-1016.patch We currently don't support MINVALUE, MAXVALUE, and CYCLE options in CREATE SEQUENCE, but we should. See http://msdn.microsoft.com/en-us/library/ff878091.aspx for the syntax. I believe MINVALUE applies if the INCREMENT is negative while MAXVALUE applies otherwise. If the value of a sequence goes beyond MINVALUE/MAXVALUE, then: - if CYCLE is true, then the sequence value should start again at the START WITH value (or the MINVALUE if specified too? Not sure about this). - if CYCLE is false, then an exception should be thrown. To implement this: - make the grammar changes in PhoenixSQL.g - add member variables for MINVALUE, MAXVALUE, and CYCLE to CreateSequenceStatement - add the appropriate error checking and handle bind variables for these new options in CreateSequenceCompiler - modify the MetaDataClient.createSequence() call by passing along these new parameters. - same for ConnectionQueryServices.createSequence() call - same for Sequence.createSequence(). - pass along these parameters as new KeyValues in the Append that constitutes the RPC call - act on these in the SequenceRegionObserver coprocessor as indicated above. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PHOENIX-1071) Provide integration for exposing Phoenix tables as Spark RDDs
[ https://issues.apache.org/jira/browse/PHOENIX-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056222#comment-14056222 ] Josh Mahonin commented on PHOENIX-1071: --- Hi Andrew, With the phoenix-pig module, there's a PhoenixInputFormat and a PhoenixOutputFormat that Spark can use to create an RDD. I'm able to both read and write Phoenix data from Spark in this way. Example: {code} val phoenixConf = new PhoenixPigConfiguration(new Configuration()) phoenixConf.setSelectStatement(SOME SELECT STATEMENT) phoenixConf.setSelectColumns(COMMA,SEPARATED,COLUMNS) phoenixConf.setSchemaType(SchemaType.QUERY) phoenixConf.configure(db-server, SOME_TABLE, 100L) val phoenixRDD = sc.newAPIHadoopRDD(phoenixConf.getConfiguration(), classOf[PhoenixInputFormat], classOf[NullWritable], classOf[PhoenixRecord]) {code} Provide integration for exposing Phoenix tables as Spark RDDs - Key: PHOENIX-1071 URL: https://issues.apache.org/jira/browse/PHOENIX-1071 Project: Phoenix Issue Type: New Feature Reporter: Andrew Purtell A core concept of Apache Spark is the resilient distributed dataset (RDD), a fault-tolerant collection of elements that can be operated on in parallel. One can create a RDDs referencing a dataset in any external storage system offering a Hadoop InputFormat, like HBase's TableInputFormat and TableSnapshotInputFormat. Phoenix as JDBC driver supporting a SQL dialect can provide interesting and deep integration. Add the ability to save RDDs back to Phoenix with a {{saveAsPhoenixTable}} action, implicitly creating necessary schema on demand. Add support for {{filter}} transformations that push predicates to the server. Add a new {{select}} transformation supporting a LINQ-like DSL, for example: {code} // Count the number of different coffee varieties offered by each // supplier from Guatemala phoenixTable(coffees) .select(c = where(c.origin == GT)) .countByKey() .foreach(r = println(r._1 + = + r._2)) {code} Support conversions between Scala and Java types and Phoenix table data. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (PHOENIX-1074) ParallelIteratorRegionSplitterFactory get Splits is not rational
jay wong created PHOENIX-1074: - Summary: ParallelIteratorRegionSplitterFactory get Splits is not rational Key: PHOENIX-1074 URL: https://issues.apache.org/jira/browse/PHOENIX-1074 Project: Phoenix Issue Type: Wish Reporter: jay wong create a table {code} create table if not exists table1( gmt VARCHAR NOT NULL, spm_type VARCHAR NOT NULL, spm VARCHAR NOT NULL, A.int_a INTEGER, B.int_b INTEGER, B.int_c INTEGER CONSTRAINT pk PRIMARY KEY (gmt, spm_type, spm)) SALT_BUCKETS = 4, bloomfilter='ROW'; {code} and made the table partition as this. |startrow|endrow| | |\x0020140201| |\x0020140201|\x0020140202| |\x0020140202|\x0020140203| |\x0020140203|\x0020140204| |\x0020140204|\x0020140205| |\x0020140205|\x0020140206| |\x0020140206|\x0020140207| |\x0020140207|\x0120140201| |\x0120140201|\x0120140202| |\x0120140202|\x0120140203| |\x0120140203|\x0120140204| |\x0120140204|\x0120140205| |\x0120140205|\x0120140206| |\x0120140206|\x0120140207| |\x0120140207|\x0220140201| |\x0220140201|\x0220140202| |\x0220140202|\x0220140203| |\x0220140203|\x0220140204| |\x0220140204|\x0220140205| |\x0220140205|\x0220140206| |\x0220140206|\x0220140207| |\x0220140207|\x0320140201| |\x0320140201|\x0320140202| |\x0320140202|\x0320140203| |\x0320140203|\x0320140204| |\x0320140204|\x0320140205| |\x0320140205|\x0320140206| |\x0320140206|\x0320140207| |\x0320140207| | Then insert some data; |GMT | SPM_TYPE |SPM | INT_A| INT_B| INT_C| | 20140201 | 1 | 1.2.3.4546 | 218| 218| null | | 20140201 | 1 | 1.2.44545 | 190| 190| null | | 20140201 | 1 | 1.353451312 | 246| 246| null | | 20140201 | 2 | 1.2.3.6775 | 183| 183| null | |...|...|...|...|...|...| | 20140207 | 3 | 1.2.3.4546 | 224| 224| null | | 20140207 | 3 | 1.2.44545 | 196| 196| null | | 20140207 | 3 | 1.353451312 | 168| 168| null | | 20140207 | 4 | 1.2.3.6775 | 189| 189| null | | 20140207 | 4 | 1.23.345345 | 217| 217| null | | 20140207 | 4 | 1.23234234234 | 245| 245| null | print a log like this {code} public class ParallelIterators extends ExplainTable implements ResultIterators { @Override public ListPeekingResultIterator getIterators() throws SQLException { boolean success = false; final ConnectionQueryServices services = context.getConnection().getQueryServices(); ReadOnlyProps props = services.getProps(); int numSplits = splits.size(); ListPeekingResultIterator iterators = new ArrayListPeekingResultIterator(numSplits); ListPairbyte[],FuturePeekingResultIterator futures = new ArrayListPairbyte[],FuturePeekingResultIterator(numSplits); final UUID scanId = UUID.randomUUID(); try { ExecutorService executor = services.getExecutor(); System.out.println(the split size is + numSplits); } } {code} then execute some sql {code} select * from table1 where gmt '20140202' and gmt '20140207' and spm_type = '2' and spm like '1.%' the split size is 31 select * from table1 where gmt '20140202' and gmt '20140207' and spm_type = '2' the split size is 31 select * from table1 where gmt '20140202' and gmt '20140207' the split size is 27 select * from table1 where gmt '20140202' and gmt '20140204' and spm_type = '2' and spm like '1.%' the split size is 28 select * from table1 where gmt '20140202' and gmt '20140204' and spm_type = '2' the split size is 28 select * from table1 where gmt '20140202' and gmt '20140204' the split size is 12 {code} but I think {code} select * from table1 where gmt '20140202' and gmt '20140207' and spm_type = '2' and spm like '1.%' {code} and {code} select * from table1 where gmt '20140202' and gmt '20140207' {code} the two sql will has the same split , but why not? -- This message was sent by Atlassian JIRA (v6.2#6252)
pull request for local indexes to go into master
Rajeshbabu has submitted a pull request for local indexes to go into master. It'd be great if a few other folks good give it a look: https://github.com/apache/phoenix/pull/1 @JeffreyZ? @Jesse? Thanks, James
[jira] [Commented] (PHOENIX-1073) A memory table in every region is needed?
[ https://issues.apache.org/jira/browse/PHOENIX-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056427#comment-14056427 ] James Taylor commented on PHOENIX-1073: --- Probably best to start on the dev list to discuss this first, rather than file a JIRA. Yes, HBase is much faster if all the data is in it's cache. A memory table in every region is needed? - Key: PHOENIX-1073 URL: https://issues.apache.org/jira/browse/PHOENIX-1073 Project: Phoenix Issue Type: Wish Reporter: jay wong When a do a group by query. We assume that a Region has 30M data. 100K row which include 30 kv per row And the RT of GroupedAggregateRegionObserver is about 3 sec. but most of time. in fact nearly 2.2 sec is spend on RegionScaner scan all the row. I have a test. first time scan all of the data into memory. the second time only load the data from memory. the RT of GroupedAggregateRegionObserver execute only 0.7s. So If a memory table is needed for Phoenix computational intensive scene -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (PHOENIX-1073) A memory table in every region is needed?
[ https://issues.apache.org/jira/browse/PHOENIX-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Taylor resolved PHOENIX-1073. --- Resolution: Invalid A memory table in every region is needed? - Key: PHOENIX-1073 URL: https://issues.apache.org/jira/browse/PHOENIX-1073 Project: Phoenix Issue Type: Wish Reporter: jay wong When a do a group by query. We assume that a Region has 30M data. 100K row which include 30 kv per row And the RT of GroupedAggregateRegionObserver is about 3 sec. but most of time. in fact nearly 2.2 sec is spend on RegionScaner scan all the row. I have a test. first time scan all of the data into memory. the second time only load the data from memory. the RT of GroupedAggregateRegionObserver execute only 0.7s. So If a memory table is needed for Phoenix computational intensive scene -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PHOENIX-1062) A SQL Trimmer for log sql execute times
[ https://issues.apache.org/jira/browse/PHOENIX-1062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056430#comment-14056430 ] James Taylor commented on PHOENIX-1062: --- I don't think universally every one would want to normalize their queries in this way. A SQL Trimmer for log sql execute times --- Key: PHOENIX-1062 URL: https://issues.apache.org/jira/browse/PHOENIX-1062 Project: Phoenix Issue Type: New Feature Affects Versions: 3.0.0 Reporter: jay wong Priority: Critical Fix For: 3.1 Attachments: SQLTrimmer.java If we need a statistics that which sql execute times just like : select a,b,c from table1 where d=13 and e='abc' limit 20; but the condition value is not needed because of overlap so the will be trim as : select a,b,c from table1 where d=? and e=? limit ?; Now the tool fix it -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PHOENIX-1055) Add support for the built-in functions HEX, OCT, and BIN
[ https://issues.apache.org/jira/browse/PHOENIX-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056436#comment-14056436 ] James Taylor commented on PHOENIX-1055: --- It kind of a nice to have for every encoder to have a decoder (or visaversa). Can you relax that requirement, perhaps have null represent the absence of one or the other? Add support for the built-in functions HEX, OCT, and BIN - Key: PHOENIX-1055 URL: https://issues.apache.org/jira/browse/PHOENIX-1055 Project: Phoenix Issue Type: New Feature Reporter: Kyle Buzsaki Attachments: PHOENIX-1055.patch, PHOENIX-1055_2.patch Add built-in functions to produce hexadecimal, octal, and binary string representations of numeric values. Example Function Specification: http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_hex http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_oct http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_bin -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (PHOENIX-1075) Mathematical order of operations are improperly evaluated.
Kyle Buzsaki created PHOENIX-1075: - Summary: Mathematical order of operations are improperly evaluated. Key: PHOENIX-1075 URL: https://issues.apache.org/jira/browse/PHOENIX-1075 Project: Phoenix Issue Type: Bug Reporter: Kyle Buzsaki The root of the issue is that, as things are now, multiplication and division don't actually have the same precedence in the grammar. Division is always grouped more tightly than multiplication and is evaluated first. Most of the time, this doesn't matter, but combined with the truncating integer division used by LongDivideExpression it produces some unexpected and probably wrong behavior. Below is an example: Expression: 6 * 4 / 3 Evaluating left to right, this should reduce as follows: 6 * 4 / 3 24 / 3 8 As phoenix is now, division has a higher precedence than multiplication. Therefore, the resulting expression tree looks like this: !http://i.imgur.com/2Zzsfpy.png! Because integer division in truncating, when the division evaluates the expression tree looks like this: !http://i.imgur.com/3cLGD0e.png! Which then evaluates to 6. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PHOENIX-1075) Mathematical order of operations are improperly evaluated.
[ https://issues.apache.org/jira/browse/PHOENIX-1075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056468#comment-14056468 ] James Taylor commented on PHOENIX-1075: --- Nice catch. There may be something in ANTLR that'll help us with this. Not sure how SQL defines precedence should work. In the meantime, users can always use parenthesis to workaround this. I don't think this should delay the work for the % operator. Mathematical order of operations are improperly evaluated. -- Key: PHOENIX-1075 URL: https://issues.apache.org/jira/browse/PHOENIX-1075 Project: Phoenix Issue Type: Bug Reporter: Kyle Buzsaki The root of the issue is that, as things are now, multiplication and division don't actually have the same precedence in the grammar. Division is always grouped more tightly than multiplication and is evaluated first. Most of the time, this doesn't matter, but combined with the truncating integer division used by LongDivideExpression it produces some unexpected and probably wrong behavior. Below is an example: Expression: 6 * 4 / 3 Evaluating left to right, this should reduce as follows: 6 * 4 / 3 24 / 3 8 As phoenix is now, division has a higher precedence than multiplication. Therefore, the resulting expression tree looks like this: !http://i.imgur.com/2Zzsfpy.png! Because integer division in truncating, when the division evaluates the expression tree looks like this: !http://i.imgur.com/3cLGD0e.png! Which then evaluates to 6. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PHOENIX-1056) A ImportTsv tool for phoenix to build table data and all index data.
[ https://issues.apache.org/jira/browse/PHOENIX-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056538#comment-14056538 ] Jeffrey Zhong commented on PHOENIX-1056: Other issues can be easily addressed except the index hfile region boundary alignment during MR otherwise LoadIncrementalHFiles will become a heavy operation. [~jaywong] Have you tried ImportTsv tool internally so that you might be able to see what's the performance difference between one single MR(plus loading hfiles) and multiple MR concurrently? A ImportTsv tool for phoenix to build table data and all index data. Key: PHOENIX-1056 URL: https://issues.apache.org/jira/browse/PHOENIX-1056 Project: Phoenix Issue Type: Task Affects Versions: 3.0.0 Reporter: jay wong Fix For: 3.1 Attachments: PHOENIX-1056.patch I have just build a tool for build table data and index table data just like ImportTsv job. http://hbase.apache.org/book/ops_mgt.html#importtsv when ImportTsv work it write HFile in a CF name path. for example A table has two cf, A and B. the output is ./outputpath/A ./outputpath/B In my job. we has a table. TableOne. and two Index IdxOne, IdxTwo. the output will be ./outputpath/TableOne/A ./outputpath/TableOne/B ./outputpath/IdxOne ./outputpath/IdxTwo. If anyone need it .I will build a clean tool. -- This message was sent by Atlassian JIRA (v6.2#6252)
TO_UNSIGNED_DATE function missing in Phoenix
Hi All, Phoenix has a DataType Unsigned_Date but now i am unable to use these columns for filtering. For using a date column in sql query i can use to_date(). I think similarly we need to have to_unsigned_date function. I can file a jira for this. Can anyone guide me how to introduce this function in sql language of Phoenix. -- Thanks Regards, Anil Gupta
[jira] [Created] (PHOENIX-1076) to_unsigned_date() function missing in sql of Phoenix
Anil Gupta created PHOENIX-1076: --- Summary: to_unsigned_date() function missing in sql of Phoenix Key: PHOENIX-1076 URL: https://issues.apache.org/jira/browse/PHOENIX-1076 Project: Phoenix Issue Type: Bug Affects Versions: 3.0.0 Reporter: Anil Gupta Hi All, Phoenix has a DataType Unsigned_Date but now i am unable to use these columns for filtering. For using a date column in sql query i can use to_date(). I think similarly we need to have to_unsigned_date function. Can anyone guide me how to introduce this function in sql language of Phoenix? -- ~Anil Gupta -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (PHOENIX-1077) IN list of row value constructors doesn't work for tenant specific views
Samarth Jain created PHOENIX-1077: - Summary: IN list of row value constructors doesn't work for tenant specific views Key: PHOENIX-1077 URL: https://issues.apache.org/jira/browse/PHOENIX-1077 Project: Phoenix Issue Type: Bug Affects Versions: 3.0.0, 4.0.0, 5.0.0 Reporter: Samarth Jain IN list of row value constructors doesn't work when queried against tenant views for multi-tenant phoenix tables. Consider this test (added in TenantSpecificTablesDMLIT.java) {code} public void testRVCOnTenantSpecificTable() throws Exception { Connection conn = nextConnection(PHOENIX_JDBC_TENANT_SPECIFIC_URL); try { conn.setAutoCommit(true); conn.createStatement().executeUpdate(upsert into + TENANT_TABLE_NAME + (id, user) values (1, 'BonA')); conn.createStatement().executeUpdate(upsert into + TENANT_TABLE_NAME + (id, user) values (2, 'BonB')); conn.createStatement().executeUpdate(upsert into + TENANT_TABLE_NAME + (id, user) values (3, 'BonC')); conn.close(); conn = nextConnection(PHOENIX_JDBC_TENANT_SPECIFIC_URL); PreparedStatement stmt = conn.prepareStatement(select id from + TENANT_TABLE_NAME + WHERE (id, user) IN ((?, ?), (?, ?), (?, ?))); stmt.setInt(1, 1); stmt.setString(2, BonA); stmt.setInt(3, 2); stmt.setString(4, BonB); stmt.setInt(5, 3); stmt.setString(6, BonC); ResultSet rs = stmt.executeQuery(); assertTrue(rs.next()); assertEquals(1, rs.getInt(1)); assertTrue(rs.next()); assertEquals(2, rs.getInt(1)); assertTrue(rs.next()); assertEquals(3, rs.getInt(1)); assertFalse(rs.next()); } finally { conn.close(); } } {code} Replacing TENANT_TABLE_NAME with PARENT_TABLE_NAME (that is the base table), the test works fine. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: TO_UNSIGNED_DATE function missing in Phoenix
Hi Anil, Try using CAST to explicitly cast the result to an unsigned date, like this: CAST(TO_DATE(someDate) AS UNSIGNED_DATE) Thanks, James On Wed, Jul 9, 2014 at 8:38 PM, anil gupta anilgupt...@gmail.com wrote: Hi All, Phoenix has a DataType Unsigned_Date but now i am unable to use these columns for filtering. For using a date column in sql query i can use to_date(). I think similarly we need to have to_unsigned_date function. I can file a jira for this. Can anyone guide me how to introduce this function in sql language of Phoenix. -- Thanks Regards, Anil Gupta
[jira] [Commented] (PHOENIX-1076) to_unsigned_date() function missing in sql of Phoenix
[ https://issues.apache.org/jira/browse/PHOENIX-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056606#comment-14056606 ] Anil Gupta commented on PHOENIX-1076: - [~jamestaylor]: Can you provide me high level steps to implement to_unsigned_date() function. to_unsigned_date() function missing in sql of Phoenix - Key: PHOENIX-1076 URL: https://issues.apache.org/jira/browse/PHOENIX-1076 Project: Phoenix Issue Type: Bug Affects Versions: 3.0.0 Reporter: Anil Gupta Hi All, Phoenix has a DataType Unsigned_Date but now i am unable to use these columns for filtering. For using a date column in sql query i can use to_date(). I think similarly we need to have to_unsigned_date function. Can anyone guide me how to introduce this function in sql language of Phoenix? -- ~Anil Gupta -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: TO_UNSIGNED_DATE function missing in Phoenix
Hi James, I tried following query: select * from events where dummy_date=CAST(TO_DATE('2012-12-23', '-MM-dd') AS UNSIGNED_DATE) and id='1234' limit 50; But, i get following error: Error: ERROR 602 (42P00): Syntax error. Missing LPAREN at line 1, column 30. (state=42P00,code=602) Can you tell me whats wrong here? Thanks, Anil Gupta On Wed, Jul 9, 2014 at 11:48 AM, James Taylor jamestay...@apache.org wrote: Hi Anil, Try using CAST to explicitly cast the result to an unsigned date, like this: CAST(TO_DATE(someDate) AS UNSIGNED_DATE) Thanks, James On Wed, Jul 9, 2014 at 8:38 PM, anil gupta anilgupt...@gmail.com wrote: Hi All, Phoenix has a DataType Unsigned_Date but now i am unable to use these columns for filtering. For using a date column in sql query i can use to_date(). I think similarly we need to have to_unsigned_date function. I can file a jira for this. Can anyone guide me how to introduce this function in sql language of Phoenix. -- Thanks Regards, Anil Gupta -- Thanks Regards, Anil Gupta
[jira] [Commented] (PHOENIX-938) Use higher priority queue for index updates to prevent deadlock
[ https://issues.apache.org/jira/browse/PHOENIX-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056785#comment-14056785 ] Jesse Yates commented on PHOENIX-938: - Started looking at this today at first, 0.98.3 is pretty simple. Then it starts to get complicated as 0.98.4 completely changed how rpc scheduling is implemented (HBASE-11355). I don't know if we have the bandwidth to continually monitor all the possible changes to the scheduler code to support this. Further, as we look to real transactions, this implementation becomes somewhat moot; maybe we just leave the code as-is? The nitty-gritty of the details is that 0.98.4 introduced the idea of an RpcExecutor (which is a great improvement over the current munging), but that isn't in 0.98.3, so we would either need to port that class to phoenix (losing any updates from the HBase community), but that's kinda already what I was doing with this patch, so maybe that's alright for now. Now, we could have a whole reflection framework to support the different HBase versions we are running (which becomes a testing pain, but doable) and then pick the most optimal one (0.98.4+ just uses RpcExecutor as-is, 0.98.3 uses the copied code, =0.98.2 ignores). Or we can copy the changed implementations back and just use the same thing everywhere, but we loose out on changes... There really isn't a clean solution here :-/ Really, this stems from the RpcScheduler code being a private interface in HBase but wanting to leverage it outside HBase. thoughts [~jamestaylor]? Use higher priority queue for index updates to prevent deadlock --- Key: PHOENIX-938 URL: https://issues.apache.org/jira/browse/PHOENIX-938 Project: Phoenix Issue Type: Bug Affects Versions: 4.0.0, 4.1 Reporter: James Taylor Assignee: Jesse Yates Fix For: 5.0.0, 4.1 Attachments: phoenix-938-4.0-v0.patch, phoenix-938-master-v0.patch, phoenix-938-master-v1.patch With our current global secondary indexing solution, a batched Put of table data causes a RS to do a batch Put to other RSs. This has the potential to lead to a deadlock if all RS are overloaded and unable to process the pending batched Put. To prevent this, we should use a higher priority queue to submit these Puts so that they're always processed before other Puts. This will prevent the potential for a deadlock under high load. Note that this will likely require some HBase 0.98 code changes and would not be feasible to implement for HBase 0.94. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PHOENIX-1071) Provide integration for exposing Phoenix tables as Spark RDDs
[ https://issues.apache.org/jira/browse/PHOENIX-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056796#comment-14056796 ] Andrew Purtell commented on PHOENIX-1071: - Thanks [~jmahonin], good point, we might start with the existing input/output formats as suggested for HBase on HBASE-11482 Provide integration for exposing Phoenix tables as Spark RDDs - Key: PHOENIX-1071 URL: https://issues.apache.org/jira/browse/PHOENIX-1071 Project: Phoenix Issue Type: New Feature Reporter: Andrew Purtell A core concept of Apache Spark is the resilient distributed dataset (RDD), a fault-tolerant collection of elements that can be operated on in parallel. One can create a RDDs referencing a dataset in any external storage system offering a Hadoop InputFormat, like HBase's TableInputFormat and TableSnapshotInputFormat. Phoenix as JDBC driver supporting a SQL dialect can provide interesting and deep integration. Add the ability to save RDDs back to Phoenix with a {{saveAsPhoenixTable}} action, implicitly creating necessary schema on demand. Add support for {{filter}} transformations that push predicates to the server. Add a new {{select}} transformation supporting a LINQ-like DSL, for example: {code} // Count the number of different coffee varieties offered by each // supplier from Guatemala phoenixTable(coffees) .select(c = where(c.origin == GT)) .countByKey() .foreach(r = println(r._1 + = + r._2)) {code} Support conversions between Scala and Java types and Phoenix table data. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PHOENIX-1071) Provide integration for exposing Phoenix tables as Spark RDDs
[ https://issues.apache.org/jira/browse/PHOENIX-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated PHOENIX-1071: Description: A core concept of Apache Spark is the resilient distributed dataset (RDD), a fault-tolerant collection of elements that can be operated on in parallel. One can create a RDDs referencing a dataset in any external storage system offering a Hadoop InputFormat, like PhoenixInputFormat and PhoenixOutputFormat. There could be opportunities for additional interesting and deep integration. Add the ability to save RDDs back to Phoenix with a {{saveAsPhoenixTable}} action, implicitly creating necessary schema on demand. Add support for {{filter}} transformations that push predicates to the server. Add a new {{select}} transformation supporting a LINQ-like DSL, for example: {code} // Count the number of different coffee varieties offered by each // supplier from Guatemala phoenixTable(coffees) .select(c = where(c.origin == GT)) .countByKey() .foreach(r = println(r._1 + = + r._2)) {code} Support conversions between Scala and Java types and Phoenix table data. was: A core concept of Apache Spark is the resilient distributed dataset (RDD), a fault-tolerant collection of elements that can be operated on in parallel. One can create a RDDs referencing a dataset in any external storage system offering a Hadoop InputFormat, like HBase's TableInputFormat and TableSnapshotInputFormat. Phoenix as JDBC driver supporting a SQL dialect can provide interesting and deep integration. Add the ability to save RDDs back to Phoenix with a {{saveAsPhoenixTable}} action, implicitly creating necessary schema on demand. Add support for {{filter}} transformations that push predicates to the server. Add a new {{select}} transformation supporting a LINQ-like DSL, for example: {code} // Count the number of different coffee varieties offered by each // supplier from Guatemala phoenixTable(coffees) .select(c = where(c.origin == GT)) .countByKey() .foreach(r = println(r._1 + = + r._2)) {code} Support conversions between Scala and Java types and Phoenix table data. Provide integration for exposing Phoenix tables as Spark RDDs - Key: PHOENIX-1071 URL: https://issues.apache.org/jira/browse/PHOENIX-1071 Project: Phoenix Issue Type: New Feature Reporter: Andrew Purtell A core concept of Apache Spark is the resilient distributed dataset (RDD), a fault-tolerant collection of elements that can be operated on in parallel. One can create a RDDs referencing a dataset in any external storage system offering a Hadoop InputFormat, like PhoenixInputFormat and PhoenixOutputFormat. There could be opportunities for additional interesting and deep integration. Add the ability to save RDDs back to Phoenix with a {{saveAsPhoenixTable}} action, implicitly creating necessary schema on demand. Add support for {{filter}} transformations that push predicates to the server. Add a new {{select}} transformation supporting a LINQ-like DSL, for example: {code} // Count the number of different coffee varieties offered by each // supplier from Guatemala phoenixTable(coffees) .select(c = where(c.origin == GT)) .countByKey() .foreach(r = println(r._1 + = + r._2)) {code} Support conversions between Scala and Java types and Phoenix table data. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PHOENIX-938) Use higher priority queue for index updates to prevent deadlock
[ https://issues.apache.org/jira/browse/PHOENIX-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056818#comment-14056818 ] Andrew Purtell commented on PHOENIX-938: I linked over to HBASE-11355 with a comment pointing here so watchers on that issue can find their way over. Use higher priority queue for index updates to prevent deadlock --- Key: PHOENIX-938 URL: https://issues.apache.org/jira/browse/PHOENIX-938 Project: Phoenix Issue Type: Bug Affects Versions: 4.0.0, 4.1 Reporter: James Taylor Assignee: Jesse Yates Fix For: 5.0.0, 4.1 Attachments: phoenix-938-4.0-v0.patch, phoenix-938-master-v0.patch, phoenix-938-master-v1.patch With our current global secondary indexing solution, a batched Put of table data causes a RS to do a batch Put to other RSs. This has the potential to lead to a deadlock if all RS are overloaded and unable to process the pending batched Put. To prevent this, we should use a higher priority queue to submit these Puts so that they're always processed before other Puts. This will prevent the potential for a deadlock under high load. Note that this will likely require some HBase 0.98 code changes and would not be feasible to implement for HBase 0.94. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PHOENIX-938) Use higher priority queue for index updates to prevent deadlock
[ https://issues.apache.org/jira/browse/PHOENIX-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056848#comment-14056848 ] Jesse Yates commented on PHOENIX-938: - Thanks Andy! Perhaps just a bit tired today to find my way to the community solution :) Use higher priority queue for index updates to prevent deadlock --- Key: PHOENIX-938 URL: https://issues.apache.org/jira/browse/PHOENIX-938 Project: Phoenix Issue Type: Bug Affects Versions: 4.0.0, 4.1 Reporter: James Taylor Assignee: Jesse Yates Fix For: 5.0.0, 4.1 Attachments: phoenix-938-4.0-v0.patch, phoenix-938-master-v0.patch, phoenix-938-master-v1.patch With our current global secondary indexing solution, a batched Put of table data causes a RS to do a batch Put to other RSs. This has the potential to lead to a deadlock if all RS are overloaded and unable to process the pending batched Put. To prevent this, we should use a higher priority queue to submit these Puts so that they're always processed before other Puts. This will prevent the potential for a deadlock under high load. Note that this will likely require some HBase 0.98 code changes and would not be feasible to implement for HBase 0.94. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: TO_UNSIGNED_DATE function missing in Phoenix
Yes, that should work. Look like bugs to me. On Wednesday, July 9, 2014, anil gupta anilgupt...@gmail.com wrote: Hi James, I am using Phoenix3.0 along with HBase0.94.15. Do you mean to say that this query should work even if dummy_date is unsigned_date? select * from events where dummy_date=TO_DATE('2012-12-23', '-MM-dd') and id='1234' limit 50; I tried this query and i didn't get correct results. I have data in the table where dummy_date = 2012-12-23. ~Anil On Wed, Jul 9, 2014 at 12:13 PM, James Taylor jamestay...@apache.org javascript:; wrote: Not sure, as it looks correct. What version of Phoenix are you using? FWIW, that query should work w/out the cast too. Thanks, James On Wednesday, July 9, 2014, anil gupta anilgupt...@gmail.com javascript:; wrote: Hi James, I tried following query: select * from events where dummy_date=CAST(TO_DATE('2012-12-23', '-MM-dd') AS UNSIGNED_DATE) and id='1234' limit 50; But, i get following error: Error: ERROR 602 (42P00): Syntax error. Missing LPAREN at line 1, column 30. (state=42P00,code=602) Can you tell me whats wrong here? Thanks, Anil Gupta On Wed, Jul 9, 2014 at 11:48 AM, James Taylor jamestay...@apache.org javascript:; javascript:; wrote: Hi Anil, Try using CAST to explicitly cast the result to an unsigned date, like this: CAST(TO_DATE(someDate) AS UNSIGNED_DATE) Thanks, James On Wed, Jul 9, 2014 at 8:38 PM, anil gupta anilgupt...@gmail.com javascript:; javascript:; wrote: Hi All, Phoenix has a DataType Unsigned_Date but now i am unable to use these columns for filtering. For using a date column in sql query i can use to_date(). I think similarly we need to have to_unsigned_date function. I can file a jira for this. Can anyone guide me how to introduce this function in sql language of Phoenix. -- Thanks Regards, Anil Gupta -- Thanks Regards, Anil Gupta -- Thanks Regards, Anil Gupta
[jira] [Created] (PHOENIX-1078) Unable to run pig script with Phoenix.
Anil Gupta created PHOENIX-1078: --- Summary: Unable to run pig script with Phoenix. Key: PHOENIX-1078 URL: https://issues.apache.org/jira/browse/PHOENIX-1078 Project: Phoenix Issue Type: Bug Affects Versions: 3.0.0 Reporter: Anil Gupta I am running the HBase0.94.15 and latest phoenix 3.1 nightly build. I have to use pig on phoenix views. When i run the job i get following error: ERROR org.apache.hadoop.hbase.io.HbaseObjectWritable - Error in readFields java.lang.NegativeArraySizeException: -1 at org.apache.hadoop.hbase.util.Bytes.readByteArray(Bytes.java:175) at org.apache.phoenix.schema.PColumnImpl.readFields(PColumnImpl.java:157) at org.apache.phoenix.schema.PTableImpl.readFields(PTableImpl.java:721) at org.apache.phoenix.coprocessor.MetaDataProtocol$MetaDataMutationResult.readFields(MetaDataProtocol.java:161) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:692) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:596) at org.apache.hadoop.hbase.client.coprocessor.ExecResult.readFields(ExecResult.java:83) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:692) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:333) at org.apache.hadoop.hbase.ipc.SecureClient$SecureConnection.receiveResponse(SecureClient.java:383) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:588) Here is my pig script: register /tmp/phoenix-3.1-jars/phoenix-core-3.1.0-SNAPSHOT.jar; register /tmp/phoenix-3.1-jars/phoenix-pig-3.1.0-SNAPSHOT.jar; A = load 'hbase://query/SELECT * from test_table' using org.apache.phoenix.pig.PhoenixHBaseLoader('ZK'); grpd = GROUP A BY UNSIGNED_DATE_COLUMN; cnt = FOREACH grpd GENERATE group AS UNSIGNED_DATE_COLUMN,COUNT(A); DUMP cnt; -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PHOENIX-1069) Improve CsvBulkLoadTool to build indexes when loading data.
[ https://issues.apache.org/jira/browse/PHOENIX-1069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056910#comment-14056910 ] Hudson commented on PHOENIX-1069: - SUCCESS: Integrated in Phoenix | Master | Hadoop1 #263 (See [https://builds.apache.org/job/Phoenix-master-hadoop1/263/]) PHOENIX-1069: Improve CsvBulkLoadTool to build indexes when loading data. (jeffreyz: rev 9bb0b01f68e5da104810c3f1e3adb04ec2ba491f) * phoenix-core/src/main/java/org/apache/phoenix/mapreduce/CsvBulkLoadTool.java * phoenix-core/src/main/java/org/apache/phoenix/mapreduce/CsvToKeyValueMapper.java * phoenix-core/src/it/java/org/apache/phoenix/mapreduce/CsvBulkLoadToolIT.java Improve CsvBulkLoadTool to build indexes when loading data. --- Key: PHOENIX-1069 URL: https://issues.apache.org/jira/browse/PHOENIX-1069 Project: Phoenix Issue Type: Improvement Affects Versions: 3.0.0, 4.0.0 Reporter: Jeffrey Zhong Attachments: phoenix-1069.patch Currently CsvBulkLoadTool only imports data not the indexes. It will be convenient for people to load data indexes at same time. Here come the JIRA. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (PHOENIX-1069) Improve CsvBulkLoadTool to build indexes when loading data.
[ https://issues.apache.org/jira/browse/PHOENIX-1069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong reassigned PHOENIX-1069: -- Assignee: Jeffrey Zhong Improve CsvBulkLoadTool to build indexes when loading data. --- Key: PHOENIX-1069 URL: https://issues.apache.org/jira/browse/PHOENIX-1069 Project: Phoenix Issue Type: Improvement Affects Versions: 3.0.0, 4.0.0 Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 5.0.0, 3.1, 4.1 Attachments: phoenix-1069.patch Currently CsvBulkLoadTool only imports data not the indexes. It will be convenient for people to load data indexes at same time. Here come the JIRA. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (PHOENIX-1069) Improve CsvBulkLoadTool to build indexes when loading data.
[ https://issues.apache.org/jira/browse/PHOENIX-1069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong resolved PHOENIX-1069. Resolution: Fixed Fix Version/s: 4.1 3.1 5.0.0 Thanks for the review! I've integreated the patch into master, 4.0 3.0 branch. Improve CsvBulkLoadTool to build indexes when loading data. --- Key: PHOENIX-1069 URL: https://issues.apache.org/jira/browse/PHOENIX-1069 Project: Phoenix Issue Type: Improvement Affects Versions: 3.0.0, 4.0.0 Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 5.0.0, 3.1, 4.1 Attachments: phoenix-1069.patch Currently CsvBulkLoadTool only imports data not the indexes. It will be convenient for people to load data indexes at same time. Here come the JIRA. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (PHOENIX-1077) IN list of row value constructors doesn't work for tenant specific views
[ https://issues.apache.org/jira/browse/PHOENIX-1077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Levine reassigned PHOENIX-1077: --- Assignee: Eli Levine IN list of row value constructors doesn't work for tenant specific views Key: PHOENIX-1077 URL: https://issues.apache.org/jira/browse/PHOENIX-1077 Project: Phoenix Issue Type: Bug Affects Versions: 3.0.0, 4.0.0, 5.0.0 Reporter: Samarth Jain Assignee: Eli Levine IN list of row value constructors doesn't work when queried against tenant views for multi-tenant phoenix tables. Consider this test (added in TenantSpecificTablesDMLIT.java) {code} public void testRVCOnTenantSpecificTable() throws Exception { Connection conn = nextConnection(PHOENIX_JDBC_TENANT_SPECIFIC_URL); try { conn.setAutoCommit(true); conn.createStatement().executeUpdate(upsert into + TENANT_TABLE_NAME + (id, user) values (1, 'BonA')); conn.createStatement().executeUpdate(upsert into + TENANT_TABLE_NAME + (id, user) values (2, 'BonB')); conn.createStatement().executeUpdate(upsert into + TENANT_TABLE_NAME + (id, user) values (3, 'BonC')); conn.close(); conn = nextConnection(PHOENIX_JDBC_TENANT_SPECIFIC_URL); PreparedStatement stmt = conn.prepareStatement(select id from + TENANT_TABLE_NAME + WHERE (id, user) IN ((?, ?), (?, ?), (?, ?))); stmt.setInt(1, 1); stmt.setString(2, BonA); stmt.setInt(3, 2); stmt.setString(4, BonB); stmt.setInt(5, 3); stmt.setString(6, BonC); ResultSet rs = stmt.executeQuery(); assertTrue(rs.next()); assertEquals(1, rs.getInt(1)); assertTrue(rs.next()); assertEquals(2, rs.getInt(1)); assertTrue(rs.next()); assertEquals(3, rs.getInt(1)); assertFalse(rs.next()); } finally { conn.close(); } } {code} Replacing TENANT_TABLE_NAME with PARENT_TABLE_NAME (that is the base table), the test works fine. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PHOENIX-1080) Fix PhoenixRuntime.decodepk for salted tables. Add integration tests.
[ https://issues.apache.org/jira/browse/PHOENIX-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain updated PHOENIX-1080: -- Attachment: encodeDecode_master_4.patch encodeDecode_master_4.patch - for master and 4.0 branches. Fix PhoenixRuntime.decodepk for salted tables. Add integration tests. - Key: PHOENIX-1080 URL: https://issues.apache.org/jira/browse/PHOENIX-1080 Project: Phoenix Issue Type: Bug Affects Versions: 3.0.0, 4.0.0, 5.0.0 Reporter: Samarth Jain Assignee: Samarth Jain Attachments: encodeDecode_master_4.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PHOENIX-1074) ParallelIteratorRegionSplitterFactory get Splits is not rational
[ https://issues.apache.org/jira/browse/PHOENIX-1074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jay wong updated PHOENIX-1074: -- Issue Type: Bug (was: Wish) ParallelIteratorRegionSplitterFactory get Splits is not rational Key: PHOENIX-1074 URL: https://issues.apache.org/jira/browse/PHOENIX-1074 Project: Phoenix Issue Type: Bug Reporter: jay wong create a table {code} create table if not exists table1( gmt VARCHAR NOT NULL, spm_type VARCHAR NOT NULL, spm VARCHAR NOT NULL, A.int_a INTEGER, B.int_b INTEGER, B.int_c INTEGER CONSTRAINT pk PRIMARY KEY (gmt, spm_type, spm)) SALT_BUCKETS = 4, bloomfilter='ROW'; {code} and made the table 29 partitions as this. |startrow|endrow| | |\x0020140201| |\x0020140201|\x0020140202| |\x0020140202|\x0020140203| |\x0020140203|\x0020140204| |\x0020140204|\x0020140205| |\x0020140205|\x0020140206| |\x0020140206|\x0020140207| |\x0020140207|\x0120140201| |\x0120140201|\x0120140202| |\x0120140202|\x0120140203| |\x0120140203|\x0120140204| |\x0120140204|\x0120140205| |\x0120140205|\x0120140206| |\x0120140206|\x0120140207| |\x0120140207|\x0220140201| |\x0220140201|\x0220140202| |\x0220140202|\x0220140203| |\x0220140203|\x0220140204| |\x0220140204|\x0220140205| |\x0220140205|\x0220140206| |\x0220140206|\x0220140207| |\x0220140207|\x0320140201| |\x0320140201|\x0320140202| |\x0320140202|\x0320140203| |\x0320140203|\x0320140204| |\x0320140204|\x0320140205| |\x0320140205|\x0320140206| |\x0320140206|\x0320140207| |\x0320140207| | Then insert some data; |GMT | SPM_TYPE |SPM | INT_A| INT_B| INT_C | | 20140201 | 1 | 1.2.3.4546 | 218| 218| null | | 20140201 | 1 | 1.2.44545 | 190| 190| null | | 20140201 | 1 | 1.353451312 | 246| 246| null | | 20140201 | 2 | 1.2.3.6775 | 183| 183| null | |...|...|...|...|...|...| | 20140207 | 3 | 1.2.3.4546 | 224| 224| null | | 20140207 | 3 | 1.2.44545 | 196| 196| null | | 20140207 | 3 | 1.353451312 | 168| 168| null | | 20140207 | 4 | 1.2.3.6775 | 189| 189| null | | 20140207 | 4 | 1.23.345345 | 217| 217| null | | 20140207 | 4 | 1.23234234234 | 245| 245| null | print a log like this {code} public class ParallelIterators extends ExplainTable implements ResultIterators { @Override public ListPeekingResultIterator getIterators() throws SQLException { boolean success = false; final ConnectionQueryServices services = context.getConnection().getQueryServices(); ReadOnlyProps props = services.getProps(); int numSplits = splits.size(); ListPeekingResultIterator iterators = new ArrayListPeekingResultIterator(numSplits); ListPairbyte[],FuturePeekingResultIterator futures = new ArrayListPairbyte[],FuturePeekingResultIterator(numSplits); final UUID scanId = UUID.randomUUID(); try { ExecutorService executor = services.getExecutor(); System.out.println(the split size is + numSplits); } } {code} then execute some sql {code} select * from table1 where gmt '20140202' and gmt '20140207' and spm_type = '2' and spm like '1.%' the split size is 31 select * from table1 where gmt '20140202' and gmt '20140207' and spm_type = '2' the split size is 31 select * from table1 where gmt '20140202' and gmt '20140207' the split size is 27 select * from table1 where gmt '20140202' and gmt '20140204' and spm_type = '2' and spm like '1.%' the split size is 28 select * from table1 where gmt '20140202' and gmt '20140204' and spm_type = '2' the split size is 28 select * from table1 where gmt '20140202' and gmt '20140204' the split size is 12 {code} but I think {code} select * from table1 where gmt '20140202' and gmt '20140207' and spm_type = '2' and spm like '1.%' {code} and {code} select * from table1 where gmt '20140202' and gmt '20140207' {code} the two sql will has the same split , but why not? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PHOENIX-1056) A ImportTsv tool for phoenix to build table data and all index data.
[ https://issues.apache.org/jira/browse/PHOENIX-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14057040#comment-14057040 ] Jeffrey Zhong commented on PHOENIX-1056: {quote} Sometimes build all the data in a single MR is not only for performance. but also the data consistency. {quote} But for bulk loading, I think we can safely make the assumption that input data won't change, no? A ImportTsv tool for phoenix to build table data and all index data. Key: PHOENIX-1056 URL: https://issues.apache.org/jira/browse/PHOENIX-1056 Project: Phoenix Issue Type: Task Affects Versions: 3.0.0 Reporter: jay wong Fix For: 3.1 Attachments: PHOENIX-1056.patch I have just build a tool for build table data and index table data just like ImportTsv job. http://hbase.apache.org/book/ops_mgt.html#importtsv when ImportTsv work it write HFile in a CF name path. for example A table has two cf, A and B. the output is ./outputpath/A ./outputpath/B In my job. we has a table. TableOne. and two Index IdxOne, IdxTwo. the output will be ./outputpath/TableOne/A ./outputpath/TableOne/B ./outputpath/IdxOne ./outputpath/IdxTwo. If anyone need it .I will build a clean tool. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Reopened] (PHOENIX-1072) Fast fail sqlline.py when pass wrong quorum string or hbase cluster hasnt' started yet
[ https://issues.apache.org/jira/browse/PHOENIX-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong reopened PHOENIX-1072: there are some test failures. Reopen it. Fast fail sqlline.py when pass wrong quorum string or hbase cluster hasnt' started yet --- Key: PHOENIX-1072 URL: https://issues.apache.org/jira/browse/PHOENIX-1072 Project: Phoenix Issue Type: Improvement Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 5.0.0, 3.1, 4.1 Attachments: phoenix-1072.patch Currently sqlline.py will retry 35 times to talk to HBase master when the passed in quorum string is wrong or the underlying HBase isn't running. In that situation, Sqlline will stuck there forever. The JIRA is aiming to fast fail sqlline.py. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PHOENIX-1072) Fast fail sqlline.py when pass wrong quorum string or hbase cluster hasnt' started yet
[ https://issues.apache.org/jira/browse/PHOENIX-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14057060#comment-14057060 ] Hudson commented on PHOENIX-1072: - SUCCESS: Integrated in Phoenix | 3.0 | Hadoop1 #129 (See [https://builds.apache.org/job/Phoenix-3.0-hadoop1/129/]) Revert PHOENIX-1072: Fast fail sqlline.py when pass wrong quorum string or hbase cluster hasnt' started yet (jeffreyz: rev 6085a4b30ba0702bdf1106136c4ec6364eae2e70) * bin/log4j.properties * phoenix-core/src/main/java/org/apache/phoenix/query/ConnectionQueryServicesImpl.java Fast fail sqlline.py when pass wrong quorum string or hbase cluster hasnt' started yet --- Key: PHOENIX-1072 URL: https://issues.apache.org/jira/browse/PHOENIX-1072 Project: Phoenix Issue Type: Improvement Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 5.0.0, 3.1, 4.1 Attachments: phoenix-1072.patch Currently sqlline.py will retry 35 times to talk to HBase master when the passed in quorum string is wrong or the underlying HBase isn't running. In that situation, Sqlline will stuck there forever. The JIRA is aiming to fast fail sqlline.py. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PHOENIX-1072) Fast fail sqlline.py when pass wrong quorum string or hbase cluster hasnt' started yet
[ https://issues.apache.org/jira/browse/PHOENIX-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14057061#comment-14057061 ] Hudson commented on PHOENIX-1072: - SUCCESS: Integrated in Phoenix | Master | Hadoop1 #265 (See [https://builds.apache.org/job/Phoenix-master-hadoop1/265/]) Revert PHOENIX-1072: Fast fail sqlline.py when pass wrong quorum string or hbase cluster hasnt' started yet (jeffreyz: rev a33811cae4d4a0174ed5c3c19502af1871cf732c) * phoenix-core/src/main/java/org/apache/phoenix/query/ConnectionQueryServicesImpl.java * bin/log4j.properties Fast fail sqlline.py when pass wrong quorum string or hbase cluster hasnt' started yet --- Key: PHOENIX-1072 URL: https://issues.apache.org/jira/browse/PHOENIX-1072 Project: Phoenix Issue Type: Improvement Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 5.0.0, 3.1, 4.1 Attachments: phoenix-1072.patch Currently sqlline.py will retry 35 times to talk to HBase master when the passed in quorum string is wrong or the underlying HBase isn't running. In that situation, Sqlline will stuck there forever. The JIRA is aiming to fast fail sqlline.py. -- This message was sent by Atlassian JIRA (v6.2#6252)