[jira] [Commented] (PHOENIX-1056) A ImportTsv tool for phoenix to build table data and all index data.

2014-07-09 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14055974#comment-14055974
 ] 

James Taylor commented on PHOENIX-1056:
---

Thanks, [~jaywong]. That's a good improvement to build both the table data and 
the index data in a single job. Open issues are:
- Do we need both a CSV bulk loader and an ImportTsv tool? How are they 
different? Or can the improvements you made be folded into the CSV bulk loader 
instead? If we do need both, can the ImportTsv tool be built on top of the CSV 
bulk loader?
- The CSV bulk loader uses publicly exposed Phoenix APIs to get at the 
underlying KeyValues and uses the Phoenix table metadata to drive the import, 
while the ImportTSV tool requires the column information to be passed through 
in a somewhat awkward manner (leaving room for discrepancies between the real 
schema and the one passed in). The ImportTSV should go through the same Phoenix 
APIs as the CSV bulk loader IMO.

Thoughts? Would be interested in your opinions, [~gabriel.reid] and 
[~maghamravikiran]


 A ImportTsv tool for phoenix to build table data and all index data.
 

 Key: PHOENIX-1056
 URL: https://issues.apache.org/jira/browse/PHOENIX-1056
 Project: Phoenix
  Issue Type: Task
Affects Versions: 3.0.0
Reporter: jay wong
 Fix For: 3.1

 Attachments: PHOENIX-1056.patch


 I have just build a tool for build table data and index table data just like 
 ImportTsv job.
 http://hbase.apache.org/book/ops_mgt.html#importtsv
 when ImportTsv work it write HFile in a CF name path.
 for example A table has two cf, A and B.
 the output is 
 ./outputpath/A
 ./outputpath/B
 In my job. we has a table.  TableOne. and two Index IdxOne, IdxTwo.
 the output will be
 ./outputpath/TableOne/A
 ./outputpath/TableOne/B
 ./outputpath/IdxOne
 ./outputpath/IdxTwo.
 If anyone need it .I will build a clean tool.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Reopened] (PHOENIX-1067) Add documentation for ANY/ALL support with arrays

2014-07-09 Thread James Taylor (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-1067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Taylor reopened PHOENIX-1067:
---


Thanks for the additional documentation - it looks good, [~ram_krish]. In your 
local svn repo, you need to run the build.sh script. It will generate the html 
based on your md file. Then you need to check in the html file and the site 
will be updated with your changes.

 Add documentation for ANY/ALL support with arrays
 -

 Key: PHOENIX-1067
 URL: https://issues.apache.org/jira/browse/PHOENIX-1067
 Project: Phoenix
  Issue Type: Improvement
Reporter: James Taylor
Assignee: ramkrishna.s.vasudevan
Priority: Minor
 Attachments: Phoenix-1067.patch


 Please add some documentation and a few examples for the new ANY/ALL support 
 for arrays here: http://phoenix.apache.org/array_type.html
 Our website lives in svn - just checkout 
 https://svn.apache.org/repos/asf/phoenix and find the relavent source file 
 (typically a markdown file with a .md extension), modify it, build the 
 website using the build.sh script and check in the modified files.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PHOENIX-1067) Add documentation for ANY/ALL support with arrays

2014-07-09 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056169#comment-14056169
 ] 

ramkrishna.s.vasudevan commented on PHOENIX-1067:
-

Done. And now the site is updated.  Thanks James.

 Add documentation for ANY/ALL support with arrays
 -

 Key: PHOENIX-1067
 URL: https://issues.apache.org/jira/browse/PHOENIX-1067
 Project: Phoenix
  Issue Type: Improvement
Reporter: James Taylor
Assignee: ramkrishna.s.vasudevan
Priority: Minor
 Attachments: Phoenix-1067.patch


 Please add some documentation and a few examples for the new ANY/ALL support 
 for arrays here: http://phoenix.apache.org/array_type.html
 Our website lives in svn - just checkout 
 https://svn.apache.org/repos/asf/phoenix and find the relavent source file 
 (typically a markdown file with a .md extension), modify it, build the 
 website using the build.sh script and check in the modified files.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PHOENIX-1016) Support MINVALUE, MAXVALUE, and CYCLE options in CREATE SEQUENCE

2014-07-09 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056217#comment-14056217
 ] 

James Taylor commented on PHOENIX-1016:
---

Thanks for the patch, [~tdsilva]. It's looking very good. We need to maintain 
backward compatibility, though. Here are some ideas:

How about making the following two changes to maintain backward compatibility 
for startWith?
1. Store minValue and maxValue as null in CreateSequenceStatement if they're 
not specified.
{code}
+protected CreateSequenceStatement(TableName sequenceName,
+ParseNode startsWith, ParseNode incrementBy, ParseNode cacheSize,
+ParseNode minValue, ParseNode maxValue, boolean cycle,
+boolean ifNotExists, int bindCount) {
+this.sequenceName = sequenceName;
+this.minValue = minValue == null ? new 
LiteralParseNode(Long.MIN_VALUE) : minValue;
+this.maxValue = maxValue == null ? new 
LiteralParseNode(Long.MAX_VALUE) : maxValue;
{code}

2. Initialize startsWithValue in CreateSequenceCompiler to 1 if minValueNode is 
null and incrementBy is positive or maxValueNode is null and incrementBy is 
negative? 
{code}
+if (startsWithNode == null) {
+startsWithValue = incrementBy  0 ? minValue : maxValue;
+} else {
{code}

Can you tell me more about the other change (i.e. why you need to store the 
current value instead of the next value)? This one may be ok, as wouldn't 
existing sequences just skip a sequence value (which is fine)? Or what would 
happen? If they'd return the same value again, that'd be a problem. If we had 
to, we could have a conversion script to increment all sequences, but that'd be 
a bit of a pain so I'd like to avoid it if possible.

 Support MINVALUE, MAXVALUE, and CYCLE options in CREATE SEQUENCE
 

 Key: PHOENIX-1016
 URL: https://issues.apache.org/jira/browse/PHOENIX-1016
 Project: Phoenix
  Issue Type: Bug
Reporter: James Taylor
Assignee: Thomas D'Silva
 Attachments: PHOENIX-1016.3.0.patch, PHOENIX-1016.patch


 We currently don't support MINVALUE, MAXVALUE, and CYCLE options in CREATE 
 SEQUENCE, but we should. See 
 http://msdn.microsoft.com/en-us/library/ff878091.aspx for the syntax.
 I believe MINVALUE applies if the INCREMENT is negative while MAXVALUE 
 applies otherwise. If the value of a sequence goes beyond MINVALUE/MAXVALUE, 
 then:
 - if CYCLE is true, then the sequence value should start again at the START 
 WITH value (or the MINVALUE if specified too? Not sure about this).
 - if CYCLE is false, then an exception should be thrown.
 To implement this:
 - make the grammar changes in PhoenixSQL.g
 - add member variables for MINVALUE, MAXVALUE, and CYCLE to 
 CreateSequenceStatement
 - add the appropriate error checking and handle bind variables for these new 
 options in CreateSequenceCompiler
 - modify the MetaDataClient.createSequence() call by passing along these new 
 parameters.
 - same for ConnectionQueryServices.createSequence() call
 - same for Sequence.createSequence().
 - pass along these parameters as new KeyValues in the Append that constitutes 
 the RPC call
 - act on these in the SequenceRegionObserver coprocessor as indicated above.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PHOENIX-1071) Provide integration for exposing Phoenix tables as Spark RDDs

2014-07-09 Thread Josh Mahonin (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056222#comment-14056222
 ] 

Josh Mahonin commented on PHOENIX-1071:
---

Hi Andrew,

With the phoenix-pig module, there's a PhoenixInputFormat and a 
PhoenixOutputFormat that Spark can use to create an RDD. I'm able to both read 
and write Phoenix data from Spark in this way.

Example:
{code}
val phoenixConf = new PhoenixPigConfiguration(new Configuration())
phoenixConf.setSelectStatement(SOME SELECT STATEMENT)
phoenixConf.setSelectColumns(COMMA,SEPARATED,COLUMNS)
phoenixConf.setSchemaType(SchemaType.QUERY)
phoenixConf.configure(db-server, SOME_TABLE, 100L)
val phoenixRDD = sc.newAPIHadoopRDD(phoenixConf.getConfiguration(), 
classOf[PhoenixInputFormat], 
classOf[NullWritable],
classOf[PhoenixRecord])
{code}

 Provide integration for exposing Phoenix tables as Spark RDDs
 -

 Key: PHOENIX-1071
 URL: https://issues.apache.org/jira/browse/PHOENIX-1071
 Project: Phoenix
  Issue Type: New Feature
Reporter: Andrew Purtell

 A core concept of Apache Spark is the resilient distributed dataset (RDD), a 
 fault-tolerant collection of elements that can be operated on in parallel. 
 One can create a RDDs referencing a dataset in any external storage system 
 offering a Hadoop InputFormat, like HBase's TableInputFormat and 
 TableSnapshotInputFormat. Phoenix as JDBC driver supporting a SQL dialect can 
 provide interesting and deep integration. 
 Add the ability to save RDDs back to Phoenix with a {{saveAsPhoenixTable}} 
 action, implicitly creating necessary schema on demand.
 Add support for {{filter}} transformations that push predicates to the server.
 Add a new {{select}} transformation supporting a LINQ-like DSL, for example:
 {code}
 // Count the number of different coffee varieties offered by each
 // supplier from Guatemala
 phoenixTable(coffees)
 .select(c =
 where(c.origin == GT))
 .countByKey()
 .foreach(r = println(r._1 + = + r._2))
 {code} 
 Support conversions between Scala and Java types and Phoenix table data.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PHOENIX-1074) ParallelIteratorRegionSplitterFactory get Splits is not rational

2014-07-09 Thread jay wong (JIRA)
jay wong created PHOENIX-1074:
-

 Summary: ParallelIteratorRegionSplitterFactory get Splits is not 
rational
 Key: PHOENIX-1074
 URL: https://issues.apache.org/jira/browse/PHOENIX-1074
 Project: Phoenix
  Issue Type: Wish
Reporter: jay wong


create a table 
{code}
create table if not exists table1(
  gmt VARCHAR NOT NULL, 
  spm_type VARCHAR NOT NULL, 
  spm VARCHAR NOT NULL, 
  A.int_a INTEGER, 
  B.int_b INTEGER, 
  B.int_c INTEGER 
  CONSTRAINT pk PRIMARY KEY (gmt, spm_type, spm)) SALT_BUCKETS = 4, 
bloomfilter='ROW';
{code}
and made the table partition as this.
|startrow|endrow|
| |\x0020140201|
|\x0020140201|\x0020140202|
|\x0020140202|\x0020140203|
|\x0020140203|\x0020140204|
|\x0020140204|\x0020140205| 
|\x0020140205|\x0020140206| 
|\x0020140206|\x0020140207|
|\x0020140207|\x0120140201|
|\x0120140201|\x0120140202|
|\x0120140202|\x0120140203|
|\x0120140203|\x0120140204|
|\x0120140204|\x0120140205|
|\x0120140205|\x0120140206|
|\x0120140206|\x0120140207|
|\x0120140207|\x0220140201|
|\x0220140201|\x0220140202|
|\x0220140202|\x0220140203|
|\x0220140203|\x0220140204|
|\x0220140204|\x0220140205|
|\x0220140205|\x0220140206|
|\x0220140206|\x0220140207|
|\x0220140207|\x0320140201|
|\x0320140201|\x0320140202|
|\x0320140202|\x0320140203|
|\x0320140203|\x0320140204|
|\x0320140204|\x0320140205|
|\x0320140205|\x0320140206|
|\x0320140206|\x0320140207|
|\x0320140207| |

Then insert some data;
|GMT |  SPM_TYPE  |SPM |   INT_A|   INT_B|   INT_C|
| 20140201   | 1  | 1.2.3.4546 | 218| 218| null   |
| 20140201   | 1  | 1.2.44545  | 190| 190| null   |
| 20140201   | 1  | 1.353451312 | 246| 246| null   |
| 20140201   | 2  | 1.2.3.6775 | 183| 183| null   |
|...|...|...|...|...|...|
| 20140207   | 3  | 1.2.3.4546 | 224| 224| null   |
| 20140207   | 3  | 1.2.44545  | 196| 196| null   |
| 20140207   | 3  | 1.353451312 | 168| 168| null   |
| 20140207   | 4  | 1.2.3.6775 | 189| 189| null   |
| 20140207   | 4  | 1.23.345345 | 217| 217| null   |
| 20140207   | 4  | 1.23234234234 | 245| 245| null  
 |

print a log like this
{code}
public class ParallelIterators extends ExplainTable implements ResultIterators {

 @Override
public ListPeekingResultIterator getIterators() throws SQLException {
boolean success = false;
final ConnectionQueryServices services = 
context.getConnection().getQueryServices();
ReadOnlyProps props = services.getProps();
int numSplits = splits.size();
ListPeekingResultIterator iterators = new 
ArrayListPeekingResultIterator(numSplits);
ListPairbyte[],FuturePeekingResultIterator futures = new 
ArrayListPairbyte[],FuturePeekingResultIterator(numSplits);
final UUID scanId = UUID.randomUUID();
try {
ExecutorService executor = services.getExecutor();
System.out.println(the split size is  + numSplits);
 
 }
}
{code}

then execute some sql 
{code}
select * from table1 where gmt  '20140202' and gmt  '20140207' and spm_type = 
'2' and spm like '1.%'
the split size is 31
select * from table1 where gmt  '20140202' and gmt  '20140207' and spm_type = 
'2'
the split size is 31
select * from table1 where gmt  '20140202' and gmt  '20140207'
the split size is 27
select * from table1 where gmt  '20140202' and gmt  '20140204' and spm_type = 
'2' and spm like '1.%'
the split size is 28
select * from table1 where gmt  '20140202' and gmt  '20140204' and spm_type = 
'2'
the split size is 28
select * from table1 where gmt  '20140202' and gmt  '20140204'
the split size is 12
{code}

but I think 
{code}
select * from table1 where gmt  '20140202' and gmt  '20140207' and spm_type = 
'2' and spm like '1.%'
{code}
and 
{code}
select * from table1 where gmt  '20140202' and gmt  '20140207' 
{code}
the two sql will has the same split , but why not?






--
This message was sent by Atlassian JIRA
(v6.2#6252)


pull request for local indexes to go into master

2014-07-09 Thread James Taylor
Rajeshbabu has submitted a pull request for local indexes to go into
master. It'd be great if a few other folks good give it a look:
https://github.com/apache/phoenix/pull/1

@JeffreyZ? @Jesse?

Thanks,
James


[jira] [Commented] (PHOENIX-1073) A memory table in every region is needed?

2014-07-09 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056427#comment-14056427
 ] 

James Taylor commented on PHOENIX-1073:
---

Probably best to start on the dev list to discuss this first, rather than file 
a JIRA. Yes, HBase is much faster if all the data is in it's cache.

 A memory table in every region is needed?
 -

 Key: PHOENIX-1073
 URL: https://issues.apache.org/jira/browse/PHOENIX-1073
 Project: Phoenix
  Issue Type: Wish
Reporter: jay wong

 When a do a group by query.
 We assume that a Region has 30M data.
 100K row which include 30 kv per row
 And the RT of GroupedAggregateRegionObserver is about 3 sec.
 but most of time. in fact nearly  2.2 sec is spend on RegionScaner scan all 
 the row.
 I have a test.  first time scan all of the data into memory.
 the second time only load the data from memory.
 the RT of GroupedAggregateRegionObserver execute only 0.7s.
 So If a memory table is needed for Phoenix computational intensive scene



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (PHOENIX-1073) A memory table in every region is needed?

2014-07-09 Thread James Taylor (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Taylor resolved PHOENIX-1073.
---

Resolution: Invalid

 A memory table in every region is needed?
 -

 Key: PHOENIX-1073
 URL: https://issues.apache.org/jira/browse/PHOENIX-1073
 Project: Phoenix
  Issue Type: Wish
Reporter: jay wong

 When a do a group by query.
 We assume that a Region has 30M data.
 100K row which include 30 kv per row
 And the RT of GroupedAggregateRegionObserver is about 3 sec.
 but most of time. in fact nearly  2.2 sec is spend on RegionScaner scan all 
 the row.
 I have a test.  first time scan all of the data into memory.
 the second time only load the data from memory.
 the RT of GroupedAggregateRegionObserver execute only 0.7s.
 So If a memory table is needed for Phoenix computational intensive scene



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PHOENIX-1062) A SQL Trimmer for log sql execute times

2014-07-09 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056430#comment-14056430
 ] 

James Taylor commented on PHOENIX-1062:
---

I don't think universally every one would want to normalize their queries in 
this way.

 A SQL Trimmer for log sql execute times
 ---

 Key: PHOENIX-1062
 URL: https://issues.apache.org/jira/browse/PHOENIX-1062
 Project: Phoenix
  Issue Type: New Feature
Affects Versions: 3.0.0
Reporter: jay wong
Priority: Critical
 Fix For: 3.1

 Attachments: SQLTrimmer.java


 If we need a statistics that which sql execute times
 just like : select a,b,c from table1 where d=13 and e='abc' limit 20;
 but the condition value is not needed because of overlap
 so the will be trim as : select a,b,c from table1 where d=? and e=? limit ?;
 Now the tool fix it



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PHOENIX-1055) Add support for the built-in functions HEX, OCT, and BIN

2014-07-09 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056436#comment-14056436
 ] 

James Taylor commented on PHOENIX-1055:
---

It kind of a nice to have for every encoder to have a decoder (or visaversa). 
Can you relax that requirement, perhaps have null represent the absence of one 
or the other?

 Add support for the built-in functions HEX, OCT, and BIN 
 -

 Key: PHOENIX-1055
 URL: https://issues.apache.org/jira/browse/PHOENIX-1055
 Project: Phoenix
  Issue Type: New Feature
Reporter: Kyle Buzsaki
 Attachments: PHOENIX-1055.patch, PHOENIX-1055_2.patch


 Add built-in functions to produce hexadecimal, octal, and binary string 
 representations of numeric values.
 Example Function Specification:
 http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_hex
 http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_oct
 http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_bin



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PHOENIX-1075) Mathematical order of operations are improperly evaluated.

2014-07-09 Thread Kyle Buzsaki (JIRA)
Kyle Buzsaki created PHOENIX-1075:
-

 Summary: Mathematical order of operations are improperly evaluated.
 Key: PHOENIX-1075
 URL: https://issues.apache.org/jira/browse/PHOENIX-1075
 Project: Phoenix
  Issue Type: Bug
Reporter: Kyle Buzsaki


The root of the issue is that, as things are now, multiplication and division 
don't actually have the same precedence in the grammar. Division is always 
grouped more tightly than multiplication and is evaluated first. Most of the 
time, this doesn't matter, but combined with the truncating integer division 
used by LongDivideExpression it produces some unexpected and probably wrong 
behavior. Below is an example:

Expression: 6 * 4 / 3
Evaluating left to right, this should reduce as follows:
6 * 4 / 3 
24 / 3
8

As phoenix is now, division has a higher precedence than multiplication. 
Therefore, the resulting expression tree looks like this:

!http://i.imgur.com/2Zzsfpy.png!

Because integer division in truncating, when the division evaluates the 
expression tree looks like this:

!http://i.imgur.com/3cLGD0e.png!

Which then evaluates to 6.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PHOENIX-1075) Mathematical order of operations are improperly evaluated.

2014-07-09 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056468#comment-14056468
 ] 

James Taylor commented on PHOENIX-1075:
---

Nice catch. There may be something in ANTLR that'll help us with this. Not sure 
how SQL defines precedence should work. In the meantime, users can always use 
parenthesis to workaround this. I don't think this should delay the work for 
the % operator.

 Mathematical order of operations are improperly evaluated.
 --

 Key: PHOENIX-1075
 URL: https://issues.apache.org/jira/browse/PHOENIX-1075
 Project: Phoenix
  Issue Type: Bug
Reporter: Kyle Buzsaki

 The root of the issue is that, as things are now, multiplication and division 
 don't actually have the same precedence in the grammar. Division is always 
 grouped more tightly than multiplication and is evaluated first. Most of the 
 time, this doesn't matter, but combined with the truncating integer division 
 used by LongDivideExpression it produces some unexpected and probably wrong 
 behavior. Below is an example:
 Expression: 6 * 4 / 3
 Evaluating left to right, this should reduce as follows:
 6 * 4 / 3 
 24 / 3
 8
 As phoenix is now, division has a higher precedence than multiplication. 
 Therefore, the resulting expression tree looks like this:
 !http://i.imgur.com/2Zzsfpy.png!
 Because integer division in truncating, when the division evaluates the 
 expression tree looks like this:
 !http://i.imgur.com/3cLGD0e.png!
 Which then evaluates to 6.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PHOENIX-1056) A ImportTsv tool for phoenix to build table data and all index data.

2014-07-09 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056538#comment-14056538
 ] 

Jeffrey Zhong commented on PHOENIX-1056:


Other issues can be easily addressed except the index hfile region boundary 
alignment during MR otherwise LoadIncrementalHFiles will become a heavy 
operation.  

[~jaywong] Have you tried ImportTsv tool internally so that you might be able 
to see what's the performance difference between one single MR(plus loading 
hfiles) and multiple MR concurrently? 





 A ImportTsv tool for phoenix to build table data and all index data.
 

 Key: PHOENIX-1056
 URL: https://issues.apache.org/jira/browse/PHOENIX-1056
 Project: Phoenix
  Issue Type: Task
Affects Versions: 3.0.0
Reporter: jay wong
 Fix For: 3.1

 Attachments: PHOENIX-1056.patch


 I have just build a tool for build table data and index table data just like 
 ImportTsv job.
 http://hbase.apache.org/book/ops_mgt.html#importtsv
 when ImportTsv work it write HFile in a CF name path.
 for example A table has two cf, A and B.
 the output is 
 ./outputpath/A
 ./outputpath/B
 In my job. we has a table.  TableOne. and two Index IdxOne, IdxTwo.
 the output will be
 ./outputpath/TableOne/A
 ./outputpath/TableOne/B
 ./outputpath/IdxOne
 ./outputpath/IdxTwo.
 If anyone need it .I will build a clean tool.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


TO_UNSIGNED_DATE function missing in Phoenix

2014-07-09 Thread anil gupta
Hi All,

Phoenix has a DataType Unsigned_Date but now i am unable to use these
columns for filtering. For using a date column in sql query i can use
to_date(). I think similarly we need to have to_unsigned_date function. I
can file a jira for this. Can anyone guide me how to introduce this
function in sql language of Phoenix.

-- 
Thanks  Regards,
Anil Gupta


[jira] [Created] (PHOENIX-1076) to_unsigned_date() function missing in sql of Phoenix

2014-07-09 Thread Anil Gupta (JIRA)
Anil Gupta created PHOENIX-1076:
---

 Summary: to_unsigned_date() function missing in sql of Phoenix
 Key: PHOENIX-1076
 URL: https://issues.apache.org/jira/browse/PHOENIX-1076
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Anil Gupta


Hi All,

Phoenix has a DataType Unsigned_Date but now i am unable to use these columns 
for filtering. For using a date column in sql query i can use to_date(). I 
think similarly we need to have to_unsigned_date function. Can anyone guide me 
how to introduce this function in sql language of Phoenix?

-- 
~Anil Gupta 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PHOENIX-1077) IN list of row value constructors doesn't work for tenant specific views

2014-07-09 Thread Samarth Jain (JIRA)
Samarth Jain created PHOENIX-1077:
-

 Summary: IN list of row value constructors doesn't work for tenant 
specific views
 Key: PHOENIX-1077
 URL: https://issues.apache.org/jira/browse/PHOENIX-1077
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 3.0.0, 4.0.0, 5.0.0
Reporter: Samarth Jain


IN list of row value constructors doesn't work when queried against tenant 
views for multi-tenant phoenix tables. Consider this test (added in 
TenantSpecificTablesDMLIT.java)

{code}
public void testRVCOnTenantSpecificTable() throws Exception {
Connection conn = nextConnection(PHOENIX_JDBC_TENANT_SPECIFIC_URL);
try {
conn.setAutoCommit(true);
conn.createStatement().executeUpdate(upsert into  + 
TENANT_TABLE_NAME +  (id, user) values (1, 'BonA'));
conn.createStatement().executeUpdate(upsert into  + 
TENANT_TABLE_NAME +  (id, user) values (2, 'BonB'));
conn.createStatement().executeUpdate(upsert into  + 
TENANT_TABLE_NAME +  (id, user) values (3, 'BonC'));

conn.close();

conn = nextConnection(PHOENIX_JDBC_TENANT_SPECIFIC_URL);
PreparedStatement stmt = conn.prepareStatement(select id from  + 
TENANT_TABLE_NAME +  WHERE (id, user) IN ((?, ?), (?, ?), (?, ?)));
stmt.setInt(1, 1);
stmt.setString(2, BonA);
stmt.setInt(3, 2);
stmt.setString(4, BonB);
stmt.setInt(5, 3);
stmt.setString(6, BonC);
ResultSet rs = stmt.executeQuery();
assertTrue(rs.next());
assertEquals(1, rs.getInt(1));
assertTrue(rs.next());
assertEquals(2, rs.getInt(1));
assertTrue(rs.next());
assertEquals(3, rs.getInt(1));
assertFalse(rs.next());
}
finally {
conn.close();
}
}

{code}


Replacing TENANT_TABLE_NAME with PARENT_TABLE_NAME (that is the base table), 
the test works fine.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: TO_UNSIGNED_DATE function missing in Phoenix

2014-07-09 Thread James Taylor
Hi Anil,

Try using CAST to explicitly cast the result to an unsigned date, like
this: CAST(TO_DATE(someDate) AS UNSIGNED_DATE)

Thanks,
James

On Wed, Jul 9, 2014 at 8:38 PM, anil gupta anilgupt...@gmail.com wrote:
 Hi All,

 Phoenix has a DataType Unsigned_Date but now i am unable to use these
 columns for filtering. For using a date column in sql query i can use
 to_date(). I think similarly we need to have to_unsigned_date function. I
 can file a jira for this. Can anyone guide me how to introduce this
 function in sql language of Phoenix.

 --
 Thanks  Regards,
 Anil Gupta


[jira] [Commented] (PHOENIX-1076) to_unsigned_date() function missing in sql of Phoenix

2014-07-09 Thread Anil Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056606#comment-14056606
 ] 

Anil Gupta commented on PHOENIX-1076:
-

[~jamestaylor]: Can you provide me high level steps to implement 
to_unsigned_date() function.

 to_unsigned_date() function missing in sql of Phoenix
 -

 Key: PHOENIX-1076
 URL: https://issues.apache.org/jira/browse/PHOENIX-1076
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Anil Gupta

 Hi All,
 Phoenix has a DataType Unsigned_Date but now i am unable to use these columns 
 for filtering. For using a date column in sql query i can use to_date(). I 
 think similarly we need to have to_unsigned_date function. Can anyone guide 
 me how to introduce this function in sql language of Phoenix?
 -- 
 ~Anil Gupta 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: TO_UNSIGNED_DATE function missing in Phoenix

2014-07-09 Thread anil gupta
Hi James,
I tried following query:
select * from events where dummy_date=CAST(TO_DATE('2012-12-23',
'-MM-dd') AS UNSIGNED_DATE) and id='1234' limit 50;

But, i get following error:
Error: ERROR 602 (42P00): Syntax error. Missing LPAREN at line 1, column
30. (state=42P00,code=602)

Can you tell me whats wrong here?

Thanks,
Anil Gupta


On Wed, Jul 9, 2014 at 11:48 AM, James Taylor jamestay...@apache.org
wrote:

 Hi Anil,

 Try using CAST to explicitly cast the result to an unsigned date, like
 this: CAST(TO_DATE(someDate) AS UNSIGNED_DATE)

 Thanks,
 James

 On Wed, Jul 9, 2014 at 8:38 PM, anil gupta anilgupt...@gmail.com wrote:
  Hi All,
 
  Phoenix has a DataType Unsigned_Date but now i am unable to use these
  columns for filtering. For using a date column in sql query i can use
  to_date(). I think similarly we need to have to_unsigned_date function. I
  can file a jira for this. Can anyone guide me how to introduce this
  function in sql language of Phoenix.
 
  --
  Thanks  Regards,
  Anil Gupta




-- 
Thanks  Regards,
Anil Gupta


[jira] [Commented] (PHOENIX-938) Use higher priority queue for index updates to prevent deadlock

2014-07-09 Thread Jesse Yates (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056785#comment-14056785
 ] 

Jesse Yates commented on PHOENIX-938:
-

Started looking at this today at first, 0.98.3 is pretty simple. Then it 
starts to get complicated as 0.98.4 completely changed how rpc scheduling is 
implemented (HBASE-11355). I don't know if we have the bandwidth to continually 
monitor all the possible changes to the scheduler code to support this. 
Further, as we look to real transactions, this implementation becomes somewhat 
moot; maybe we just leave the code as-is?

The nitty-gritty of the details is that 0.98.4 introduced the idea of an 
RpcExecutor (which is a great improvement over the current munging), but that 
isn't in 0.98.3, so we would either need to port that class to phoenix (losing 
any updates from the HBase community), but that's kinda already what I was 
doing with this patch, so maybe that's alright for now. 

Now, we could have a whole reflection framework to support the different HBase 
versions we are running (which becomes a testing pain, but doable) and then 
pick the most optimal one (0.98.4+ just uses RpcExecutor as-is, 0.98.3 uses the 
copied code, =0.98.2 ignores). Or we can copy the changed implementations back 
and just use the same thing everywhere, but we loose out on changes... There 
really isn't a clean solution here :-/

Really, this stems from the RpcScheduler code being a private interface in 
HBase but wanting to leverage it outside HBase.

thoughts [~jamestaylor]?

 Use higher priority queue for index updates to prevent deadlock
 ---

 Key: PHOENIX-938
 URL: https://issues.apache.org/jira/browse/PHOENIX-938
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 4.0.0, 4.1
Reporter: James Taylor
Assignee: Jesse Yates
 Fix For: 5.0.0, 4.1

 Attachments: phoenix-938-4.0-v0.patch, phoenix-938-master-v0.patch, 
 phoenix-938-master-v1.patch


 With our current global secondary indexing solution, a batched Put of table 
 data causes a RS to do a batch Put to other RSs. This has the potential to 
 lead to a deadlock if all RS are overloaded and unable to process the pending 
 batched Put. To prevent this, we should use a higher priority queue to submit 
 these Puts so that they're always processed before other Puts. This will 
 prevent the potential for a deadlock under high load. Note that this will 
 likely require some HBase 0.98 code changes and would not be feasible to 
 implement for HBase 0.94.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PHOENIX-1071) Provide integration for exposing Phoenix tables as Spark RDDs

2014-07-09 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056796#comment-14056796
 ] 

Andrew Purtell commented on PHOENIX-1071:
-

Thanks [~jmahonin], good point, we might start with the existing input/output 
formats as suggested for HBase on HBASE-11482 

 Provide integration for exposing Phoenix tables as Spark RDDs
 -

 Key: PHOENIX-1071
 URL: https://issues.apache.org/jira/browse/PHOENIX-1071
 Project: Phoenix
  Issue Type: New Feature
Reporter: Andrew Purtell

 A core concept of Apache Spark is the resilient distributed dataset (RDD), a 
 fault-tolerant collection of elements that can be operated on in parallel. 
 One can create a RDDs referencing a dataset in any external storage system 
 offering a Hadoop InputFormat, like HBase's TableInputFormat and 
 TableSnapshotInputFormat. Phoenix as JDBC driver supporting a SQL dialect can 
 provide interesting and deep integration. 
 Add the ability to save RDDs back to Phoenix with a {{saveAsPhoenixTable}} 
 action, implicitly creating necessary schema on demand.
 Add support for {{filter}} transformations that push predicates to the server.
 Add a new {{select}} transformation supporting a LINQ-like DSL, for example:
 {code}
 // Count the number of different coffee varieties offered by each
 // supplier from Guatemala
 phoenixTable(coffees)
 .select(c =
 where(c.origin == GT))
 .countByKey()
 .foreach(r = println(r._1 + = + r._2))
 {code} 
 Support conversions between Scala and Java types and Phoenix table data.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PHOENIX-1071) Provide integration for exposing Phoenix tables as Spark RDDs

2014-07-09 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated PHOENIX-1071:


Description: 
A core concept of Apache Spark is the resilient distributed dataset (RDD), a 
fault-tolerant collection of elements that can be operated on in parallel. 
One can create a RDDs referencing a dataset in any external storage system 
offering a Hadoop InputFormat, like PhoenixInputFormat and PhoenixOutputFormat. 
There could be opportunities for additional interesting and deep integration. 

Add the ability to save RDDs back to Phoenix with a {{saveAsPhoenixTable}} 
action, implicitly creating necessary schema on demand.

Add support for {{filter}} transformations that push predicates to the server.

Add a new {{select}} transformation supporting a LINQ-like DSL, for example:
{code}
// Count the number of different coffee varieties offered by each
// supplier from Guatemala
phoenixTable(coffees)
.select(c =
where(c.origin == GT))
.countByKey()
.foreach(r = println(r._1 + = + r._2))
{code} 

Support conversions between Scala and Java types and Phoenix table data.

  was:
A core concept of Apache Spark is the resilient distributed dataset (RDD), a 
fault-tolerant collection of elements that can be operated on in parallel. 
One can create a RDDs referencing a dataset in any external storage system 
offering a Hadoop InputFormat, like HBase's TableInputFormat and 
TableSnapshotInputFormat. Phoenix as JDBC driver supporting a SQL dialect can 
provide interesting and deep integration. 

Add the ability to save RDDs back to Phoenix with a {{saveAsPhoenixTable}} 
action, implicitly creating necessary schema on demand.

Add support for {{filter}} transformations that push predicates to the server.

Add a new {{select}} transformation supporting a LINQ-like DSL, for example:
{code}
// Count the number of different coffee varieties offered by each
// supplier from Guatemala
phoenixTable(coffees)
.select(c =
where(c.origin == GT))
.countByKey()
.foreach(r = println(r._1 + = + r._2))
{code} 

Support conversions between Scala and Java types and Phoenix table data.


 Provide integration for exposing Phoenix tables as Spark RDDs
 -

 Key: PHOENIX-1071
 URL: https://issues.apache.org/jira/browse/PHOENIX-1071
 Project: Phoenix
  Issue Type: New Feature
Reporter: Andrew Purtell

 A core concept of Apache Spark is the resilient distributed dataset (RDD), a 
 fault-tolerant collection of elements that can be operated on in parallel. 
 One can create a RDDs referencing a dataset in any external storage system 
 offering a Hadoop InputFormat, like PhoenixInputFormat and 
 PhoenixOutputFormat. There could be opportunities for additional interesting 
 and deep integration. 
 Add the ability to save RDDs back to Phoenix with a {{saveAsPhoenixTable}} 
 action, implicitly creating necessary schema on demand.
 Add support for {{filter}} transformations that push predicates to the server.
 Add a new {{select}} transformation supporting a LINQ-like DSL, for example:
 {code}
 // Count the number of different coffee varieties offered by each
 // supplier from Guatemala
 phoenixTable(coffees)
 .select(c =
 where(c.origin == GT))
 .countByKey()
 .foreach(r = println(r._1 + = + r._2))
 {code} 
 Support conversions between Scala and Java types and Phoenix table data.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PHOENIX-938) Use higher priority queue for index updates to prevent deadlock

2014-07-09 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056818#comment-14056818
 ] 

Andrew Purtell commented on PHOENIX-938:


I linked over to HBASE-11355 with a comment pointing here so watchers on that 
issue can find their way over.

 Use higher priority queue for index updates to prevent deadlock
 ---

 Key: PHOENIX-938
 URL: https://issues.apache.org/jira/browse/PHOENIX-938
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 4.0.0, 4.1
Reporter: James Taylor
Assignee: Jesse Yates
 Fix For: 5.0.0, 4.1

 Attachments: phoenix-938-4.0-v0.patch, phoenix-938-master-v0.patch, 
 phoenix-938-master-v1.patch


 With our current global secondary indexing solution, a batched Put of table 
 data causes a RS to do a batch Put to other RSs. This has the potential to 
 lead to a deadlock if all RS are overloaded and unable to process the pending 
 batched Put. To prevent this, we should use a higher priority queue to submit 
 these Puts so that they're always processed before other Puts. This will 
 prevent the potential for a deadlock under high load. Note that this will 
 likely require some HBase 0.98 code changes and would not be feasible to 
 implement for HBase 0.94.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PHOENIX-938) Use higher priority queue for index updates to prevent deadlock

2014-07-09 Thread Jesse Yates (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056848#comment-14056848
 ] 

Jesse Yates commented on PHOENIX-938:
-

Thanks Andy! Perhaps just a bit tired today to find my way to the community 
solution :)

 Use higher priority queue for index updates to prevent deadlock
 ---

 Key: PHOENIX-938
 URL: https://issues.apache.org/jira/browse/PHOENIX-938
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 4.0.0, 4.1
Reporter: James Taylor
Assignee: Jesse Yates
 Fix For: 5.0.0, 4.1

 Attachments: phoenix-938-4.0-v0.patch, phoenix-938-master-v0.patch, 
 phoenix-938-master-v1.patch


 With our current global secondary indexing solution, a batched Put of table 
 data causes a RS to do a batch Put to other RSs. This has the potential to 
 lead to a deadlock if all RS are overloaded and unable to process the pending 
 batched Put. To prevent this, we should use a higher priority queue to submit 
 these Puts so that they're always processed before other Puts. This will 
 prevent the potential for a deadlock under high load. Note that this will 
 likely require some HBase 0.98 code changes and would not be feasible to 
 implement for HBase 0.94.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: TO_UNSIGNED_DATE function missing in Phoenix

2014-07-09 Thread James Taylor
Yes, that should work. Look like bugs to me.

On Wednesday, July 9, 2014, anil gupta anilgupt...@gmail.com wrote:

 Hi James,

 I am using Phoenix3.0 along with HBase0.94.15. Do you mean to say that this
 query should work even if dummy_date is unsigned_date?
 select * from events where dummy_date=TO_DATE('2012-12-23', '-MM-dd')
 and id='1234' limit 50;

 I tried this query and i didn't get correct results. I have data in the
 table where dummy_date = 2012-12-23.

 ~Anil



 On Wed, Jul 9, 2014 at 12:13 PM, James Taylor jamestay...@apache.org
 javascript:;
 wrote:

  Not sure, as it looks correct. What version of Phoenix are you using?
 FWIW,
  that query should work w/out the cast too.
 
  Thanks,
  James
 
  On Wednesday, July 9, 2014, anil gupta anilgupt...@gmail.com
 javascript:; wrote:
 
   Hi James,
   I tried following query:
   select * from events where dummy_date=CAST(TO_DATE('2012-12-23',
   '-MM-dd') AS UNSIGNED_DATE) and id='1234' limit 50;
  
   But, i get following error:
   Error: ERROR 602 (42P00): Syntax error. Missing LPAREN at line 1,
  column
   30. (state=42P00,code=602)
  
   Can you tell me whats wrong here?
  
   Thanks,
   Anil Gupta
  
  
   On Wed, Jul 9, 2014 at 11:48 AM, James Taylor jamestay...@apache.org
 javascript:;
   javascript:;
   wrote:
  
Hi Anil,
   
Try using CAST to explicitly cast the result to an unsigned date,
 like
this: CAST(TO_DATE(someDate) AS UNSIGNED_DATE)
   
Thanks,
James
   
On Wed, Jul 9, 2014 at 8:38 PM, anil gupta anilgupt...@gmail.com
 javascript:;
   javascript:; wrote:
 Hi All,

 Phoenix has a DataType Unsigned_Date but now i am unable to use
 these
 columns for filtering. For using a date column in sql query i can
 use
 to_date(). I think similarly we need to have to_unsigned_date
   function. I
 can file a jira for this. Can anyone guide me how to introduce this
 function in sql language of Phoenix.

 --
 Thanks  Regards,
 Anil Gupta
   
  
  
  
   --
   Thanks  Regards,
   Anil Gupta
  
 



 --
 Thanks  Regards,
 Anil Gupta



[jira] [Created] (PHOENIX-1078) Unable to run pig script with Phoenix.

2014-07-09 Thread Anil Gupta (JIRA)
Anil Gupta created PHOENIX-1078:
---

 Summary: Unable to run pig script with Phoenix.
 Key: PHOENIX-1078
 URL: https://issues.apache.org/jira/browse/PHOENIX-1078
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Anil Gupta


I am running the HBase0.94.15 and latest phoenix 3.1 nightly build. I have to 
use pig on phoenix views. When i run the job i get following error:
ERROR org.apache.hadoop.hbase.io.HbaseObjectWritable - Error in readFields
java.lang.NegativeArraySizeException: -1
at org.apache.hadoop.hbase.util.Bytes.readByteArray(Bytes.java:175)
at org.apache.phoenix.schema.PColumnImpl.readFields(PColumnImpl.java:157)
at org.apache.phoenix.schema.PTableImpl.readFields(PTableImpl.java:721)
at 
org.apache.phoenix.coprocessor.MetaDataProtocol$MetaDataMutationResult.readFields(MetaDataProtocol.java:161)
at 
org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:692)
at 
org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:596)
at 
org.apache.hadoop.hbase.client.coprocessor.ExecResult.readFields(ExecResult.java:83)
at 
org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:692)
at 
org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:333)
at 
org.apache.hadoop.hbase.ipc.SecureClient$SecureConnection.receiveResponse(SecureClient.java:383)
at 
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:588)

Here is my pig script:
register /tmp/phoenix-3.1-jars/phoenix-core-3.1.0-SNAPSHOT.jar;
register /tmp/phoenix-3.1-jars/phoenix-pig-3.1.0-SNAPSHOT.jar;
A = load 'hbase://query/SELECT * from test_table' using 
org.apache.phoenix.pig.PhoenixHBaseLoader('ZK');
grpd = GROUP A BY UNSIGNED_DATE_COLUMN;
cnt = FOREACH grpd GENERATE group AS UNSIGNED_DATE_COLUMN,COUNT(A);
DUMP cnt;



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PHOENIX-1069) Improve CsvBulkLoadTool to build indexes when loading data.

2014-07-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056910#comment-14056910
 ] 

Hudson commented on PHOENIX-1069:
-

SUCCESS: Integrated in Phoenix | Master | Hadoop1 #263 (See 
[https://builds.apache.org/job/Phoenix-master-hadoop1/263/])
PHOENIX-1069: Improve CsvBulkLoadTool to build indexes when loading data. 
(jeffreyz: rev 9bb0b01f68e5da104810c3f1e3adb04ec2ba491f)
* phoenix-core/src/main/java/org/apache/phoenix/mapreduce/CsvBulkLoadTool.java
* 
phoenix-core/src/main/java/org/apache/phoenix/mapreduce/CsvToKeyValueMapper.java
* phoenix-core/src/it/java/org/apache/phoenix/mapreduce/CsvBulkLoadToolIT.java


 Improve CsvBulkLoadTool to build indexes when loading data.
 ---

 Key: PHOENIX-1069
 URL: https://issues.apache.org/jira/browse/PHOENIX-1069
 Project: Phoenix
  Issue Type: Improvement
Affects Versions: 3.0.0, 4.0.0
Reporter: Jeffrey Zhong
 Attachments: phoenix-1069.patch


 Currently CsvBulkLoadTool only imports data not the indexes. It will be 
 convenient for people to load data  indexes at same time. Here come the 
 JIRA. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (PHOENIX-1069) Improve CsvBulkLoadTool to build indexes when loading data.

2014-07-09 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-1069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong reassigned PHOENIX-1069:
--

Assignee: Jeffrey Zhong

 Improve CsvBulkLoadTool to build indexes when loading data.
 ---

 Key: PHOENIX-1069
 URL: https://issues.apache.org/jira/browse/PHOENIX-1069
 Project: Phoenix
  Issue Type: Improvement
Affects Versions: 3.0.0, 4.0.0
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
 Fix For: 5.0.0, 3.1, 4.1

 Attachments: phoenix-1069.patch


 Currently CsvBulkLoadTool only imports data not the indexes. It will be 
 convenient for people to load data  indexes at same time. Here come the 
 JIRA. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (PHOENIX-1069) Improve CsvBulkLoadTool to build indexes when loading data.

2014-07-09 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-1069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong resolved PHOENIX-1069.


   Resolution: Fixed
Fix Version/s: 4.1
   3.1
   5.0.0

Thanks for the review! I've integreated the patch into master, 4.0  3.0 branch.

 Improve CsvBulkLoadTool to build indexes when loading data.
 ---

 Key: PHOENIX-1069
 URL: https://issues.apache.org/jira/browse/PHOENIX-1069
 Project: Phoenix
  Issue Type: Improvement
Affects Versions: 3.0.0, 4.0.0
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
 Fix For: 5.0.0, 3.1, 4.1

 Attachments: phoenix-1069.patch


 Currently CsvBulkLoadTool only imports data not the indexes. It will be 
 convenient for people to load data  indexes at same time. Here come the 
 JIRA. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (PHOENIX-1077) IN list of row value constructors doesn't work for tenant specific views

2014-07-09 Thread Eli Levine (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-1077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Levine reassigned PHOENIX-1077:
---

Assignee: Eli Levine

 IN list of row value constructors doesn't work for tenant specific views
 

 Key: PHOENIX-1077
 URL: https://issues.apache.org/jira/browse/PHOENIX-1077
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 3.0.0, 4.0.0, 5.0.0
Reporter: Samarth Jain
Assignee: Eli Levine

 IN list of row value constructors doesn't work when queried against tenant 
 views for multi-tenant phoenix tables. Consider this test (added in 
 TenantSpecificTablesDMLIT.java)
 {code}
 public void testRVCOnTenantSpecificTable() throws Exception {
 Connection conn = nextConnection(PHOENIX_JDBC_TENANT_SPECIFIC_URL);
 try {
 conn.setAutoCommit(true);
 conn.createStatement().executeUpdate(upsert into  + 
 TENANT_TABLE_NAME +  (id, user) values (1, 'BonA'));
 conn.createStatement().executeUpdate(upsert into  + 
 TENANT_TABLE_NAME +  (id, user) values (2, 'BonB'));
 conn.createStatement().executeUpdate(upsert into  + 
 TENANT_TABLE_NAME +  (id, user) values (3, 'BonC'));
 conn.close();
 conn = nextConnection(PHOENIX_JDBC_TENANT_SPECIFIC_URL);
 PreparedStatement stmt = conn.prepareStatement(select id from  
 + TENANT_TABLE_NAME +  WHERE (id, user) IN ((?, ?), (?, ?), (?, ?)));
 stmt.setInt(1, 1);
 stmt.setString(2, BonA);
 stmt.setInt(3, 2);
 stmt.setString(4, BonB);
 stmt.setInt(5, 3);
 stmt.setString(6, BonC);
 ResultSet rs = stmt.executeQuery();
 assertTrue(rs.next());
 assertEquals(1, rs.getInt(1));
 assertTrue(rs.next());
 assertEquals(2, rs.getInt(1));
 assertTrue(rs.next());
 assertEquals(3, rs.getInt(1));
 assertFalse(rs.next());
 }
 finally {
 conn.close();
 }
 }
 {code}
 Replacing TENANT_TABLE_NAME with PARENT_TABLE_NAME (that is the base table), 
 the test works fine.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PHOENIX-1080) Fix PhoenixRuntime.decodepk for salted tables. Add integration tests.

2014-07-09 Thread Samarth Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samarth Jain updated PHOENIX-1080:
--

Attachment: encodeDecode_master_4.patch

encodeDecode_master_4.patch - for master and 4.0 branches.

 Fix PhoenixRuntime.decodepk for salted tables. Add integration tests.
 -

 Key: PHOENIX-1080
 URL: https://issues.apache.org/jira/browse/PHOENIX-1080
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 3.0.0, 4.0.0, 5.0.0
Reporter: Samarth Jain
Assignee: Samarth Jain
 Attachments: encodeDecode_master_4.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PHOENIX-1074) ParallelIteratorRegionSplitterFactory get Splits is not rational

2014-07-09 Thread jay wong (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-1074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jay wong updated PHOENIX-1074:
--

Issue Type: Bug  (was: Wish)

 ParallelIteratorRegionSplitterFactory get Splits is not rational
 

 Key: PHOENIX-1074
 URL: https://issues.apache.org/jira/browse/PHOENIX-1074
 Project: Phoenix
  Issue Type: Bug
Reporter: jay wong

 create a table 
 {code}
 create table if not exists table1(
   gmt VARCHAR NOT NULL, 
   spm_type VARCHAR NOT NULL, 
   spm VARCHAR NOT NULL, 
   A.int_a INTEGER, 
   B.int_b INTEGER, 
   B.int_c INTEGER 
   CONSTRAINT pk PRIMARY KEY (gmt, spm_type, spm)) SALT_BUCKETS = 4, 
 bloomfilter='ROW';
 {code}
 and made the table 29 partitions as this.
 |startrow|endrow|
 | |\x0020140201|
 |\x0020140201|\x0020140202|
 |\x0020140202|\x0020140203|
 |\x0020140203|\x0020140204|
 |\x0020140204|\x0020140205|   
 |\x0020140205|\x0020140206|   
 |\x0020140206|\x0020140207|
 |\x0020140207|\x0120140201|
 |\x0120140201|\x0120140202|
 |\x0120140202|\x0120140203|
 |\x0120140203|\x0120140204|
 |\x0120140204|\x0120140205|
 |\x0120140205|\x0120140206|
 |\x0120140206|\x0120140207|
 |\x0120140207|\x0220140201|
 |\x0220140201|\x0220140202|
 |\x0220140202|\x0220140203|
 |\x0220140203|\x0220140204|
 |\x0220140204|\x0220140205|
 |\x0220140205|\x0220140206|
 |\x0220140206|\x0220140207|
 |\x0220140207|\x0320140201|
 |\x0320140201|\x0320140202|
 |\x0320140202|\x0320140203|
 |\x0320140203|\x0320140204|
 |\x0320140204|\x0320140205|
 |\x0320140205|\x0320140206|
 |\x0320140206|\x0320140207|
 |\x0320140207| |  
 Then insert some data;
 |GMT |  SPM_TYPE  |SPM |   INT_A|   INT_B|   INT_C
 |
 | 20140201   | 1  | 1.2.3.4546 | 218| 218| null   
 |
 | 20140201   | 1  | 1.2.44545  | 190| 190| null   
 |
 | 20140201   | 1  | 1.353451312 | 246| 246| null  
  |
 | 20140201   | 2  | 1.2.3.6775 | 183| 183| null   
 |
 |...|...|...|...|...|...|
 | 20140207   | 3  | 1.2.3.4546 | 224| 224| null   
 |
 | 20140207   | 3  | 1.2.44545  | 196| 196| null   
 |
 | 20140207   | 3  | 1.353451312 | 168| 168| null  
  |
 | 20140207   | 4  | 1.2.3.6775 | 189| 189| null   
 |
 | 20140207   | 4  | 1.23.345345 | 217| 217| null  
  |
 | 20140207   | 4  | 1.23234234234 | 245| 245| null
|
 print a log like this
 {code}
 public class ParallelIterators extends ExplainTable implements 
 ResultIterators {
 
  @Override
 public ListPeekingResultIterator getIterators() throws SQLException {
 boolean success = false;
 final ConnectionQueryServices services = 
 context.getConnection().getQueryServices();
 ReadOnlyProps props = services.getProps();
 int numSplits = splits.size();
 ListPeekingResultIterator iterators = new 
 ArrayListPeekingResultIterator(numSplits);
 ListPairbyte[],FuturePeekingResultIterator futures = new 
 ArrayListPairbyte[],FuturePeekingResultIterator(numSplits);
 final UUID scanId = UUID.randomUUID();
 try {
 ExecutorService executor = services.getExecutor();
 System.out.println(the split size is  + numSplits);
  
  }
 }
 {code}
 then execute some sql 
 {code}
 select * from table1 where gmt  '20140202' and gmt  '20140207' and spm_type 
 = '2' and spm like '1.%'
 the split size is 31
 select * from table1 where gmt  '20140202' and gmt  '20140207' and spm_type 
 = '2'
 the split size is 31
 select * from table1 where gmt  '20140202' and gmt  '20140207'
 the split size is 27
 select * from table1 where gmt  '20140202' and gmt  '20140204' and spm_type 
 = '2' and spm like '1.%'
 the split size is 28
 select * from table1 where gmt  '20140202' and gmt  '20140204' and spm_type 
 = '2'
 the split size is 28
 select * from table1 where gmt  '20140202' and gmt  '20140204'
 the split size is 12
 {code}
 but I think 
 {code}
 select * from table1 where gmt  '20140202' and gmt  '20140207' and spm_type 
 = '2' and spm like '1.%'
 {code}
 and 
 {code}
 select * from table1 where gmt  '20140202' and gmt  '20140207' 
 {code}
 the two sql will has the same split , but why not?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PHOENIX-1056) A ImportTsv tool for phoenix to build table data and all index data.

2014-07-09 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14057040#comment-14057040
 ] 

Jeffrey Zhong commented on PHOENIX-1056:


{quote}
Sometimes build all the data in a single MR is not only for performance. but 
also the data consistency.
{quote}
But for bulk loading, I think we can safely make the assumption that input data 
won't change, no?

 A ImportTsv tool for phoenix to build table data and all index data.
 

 Key: PHOENIX-1056
 URL: https://issues.apache.org/jira/browse/PHOENIX-1056
 Project: Phoenix
  Issue Type: Task
Affects Versions: 3.0.0
Reporter: jay wong
 Fix For: 3.1

 Attachments: PHOENIX-1056.patch


 I have just build a tool for build table data and index table data just like 
 ImportTsv job.
 http://hbase.apache.org/book/ops_mgt.html#importtsv
 when ImportTsv work it write HFile in a CF name path.
 for example A table has two cf, A and B.
 the output is 
 ./outputpath/A
 ./outputpath/B
 In my job. we has a table.  TableOne. and two Index IdxOne, IdxTwo.
 the output will be
 ./outputpath/TableOne/A
 ./outputpath/TableOne/B
 ./outputpath/IdxOne
 ./outputpath/IdxTwo.
 If anyone need it .I will build a clean tool.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Reopened] (PHOENIX-1072) Fast fail sqlline.py when pass wrong quorum string or hbase cluster hasnt' started yet

2014-07-09 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong reopened PHOENIX-1072:



there are some test failures. Reopen it.

 Fast fail sqlline.py when pass wrong quorum string or hbase cluster hasnt' 
 started yet 
 ---

 Key: PHOENIX-1072
 URL: https://issues.apache.org/jira/browse/PHOENIX-1072
 Project: Phoenix
  Issue Type: Improvement
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
 Fix For: 5.0.0, 3.1, 4.1

 Attachments: phoenix-1072.patch


 Currently sqlline.py will retry 35 times to talk to HBase master when the 
 passed in quorum string is wrong or the underlying HBase isn't running. 
 In that situation, Sqlline will stuck there forever. The JIRA is aiming to 
 fast fail sqlline.py.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PHOENIX-1072) Fast fail sqlline.py when pass wrong quorum string or hbase cluster hasnt' started yet

2014-07-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14057060#comment-14057060
 ] 

Hudson commented on PHOENIX-1072:
-

SUCCESS: Integrated in Phoenix | 3.0 | Hadoop1 #129 (See 
[https://builds.apache.org/job/Phoenix-3.0-hadoop1/129/])
Revert PHOENIX-1072: Fast fail sqlline.py when pass wrong quorum string or 
hbase cluster hasnt' started yet (jeffreyz: rev 
6085a4b30ba0702bdf1106136c4ec6364eae2e70)
* bin/log4j.properties
* 
phoenix-core/src/main/java/org/apache/phoenix/query/ConnectionQueryServicesImpl.java


 Fast fail sqlline.py when pass wrong quorum string or hbase cluster hasnt' 
 started yet 
 ---

 Key: PHOENIX-1072
 URL: https://issues.apache.org/jira/browse/PHOENIX-1072
 Project: Phoenix
  Issue Type: Improvement
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
 Fix For: 5.0.0, 3.1, 4.1

 Attachments: phoenix-1072.patch


 Currently sqlline.py will retry 35 times to talk to HBase master when the 
 passed in quorum string is wrong or the underlying HBase isn't running. 
 In that situation, Sqlline will stuck there forever. The JIRA is aiming to 
 fast fail sqlline.py.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PHOENIX-1072) Fast fail sqlline.py when pass wrong quorum string or hbase cluster hasnt' started yet

2014-07-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14057061#comment-14057061
 ] 

Hudson commented on PHOENIX-1072:
-

SUCCESS: Integrated in Phoenix | Master | Hadoop1 #265 (See 
[https://builds.apache.org/job/Phoenix-master-hadoop1/265/])
Revert PHOENIX-1072: Fast fail sqlline.py when pass wrong quorum string or 
hbase cluster hasnt' started yet (jeffreyz: rev 
a33811cae4d4a0174ed5c3c19502af1871cf732c)
* 
phoenix-core/src/main/java/org/apache/phoenix/query/ConnectionQueryServicesImpl.java
* bin/log4j.properties


 Fast fail sqlline.py when pass wrong quorum string or hbase cluster hasnt' 
 started yet 
 ---

 Key: PHOENIX-1072
 URL: https://issues.apache.org/jira/browse/PHOENIX-1072
 Project: Phoenix
  Issue Type: Improvement
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
 Fix For: 5.0.0, 3.1, 4.1

 Attachments: phoenix-1072.patch


 Currently sqlline.py will retry 35 times to talk to HBase master when the 
 passed in quorum string is wrong or the underlying HBase isn't running. 
 In that situation, Sqlline will stuck there forever. The JIRA is aiming to 
 fast fail sqlline.py.



--
This message was sent by Atlassian JIRA
(v6.2#6252)