[jira] [Commented] (PHOENIX-1056) A ImportTsv tool for phoenix to build table data and all index data.

2014-07-10 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14057258#comment-14057258
 ] 

James Taylor commented on PHOENIX-1056:
---

Good point, [~jaywong].

 A ImportTsv tool for phoenix to build table data and all index data.
 

 Key: PHOENIX-1056
 URL: https://issues.apache.org/jira/browse/PHOENIX-1056
 Project: Phoenix
  Issue Type: Task
Affects Versions: 3.0.0
Reporter: jay wong
 Fix For: 3.1

 Attachments: PHOENIX-1056.patch


 I have just build a tool for build table data and index table data just like 
 ImportTsv job.
 http://hbase.apache.org/book/ops_mgt.html#importtsv
 when ImportTsv work it write HFile in a CF name path.
 for example A table has two cf, A and B.
 the output is 
 ./outputpath/A
 ./outputpath/B
 In my job. we has a table.  TableOne. and two Index IdxOne, IdxTwo.
 the output will be
 ./outputpath/TableOne/A
 ./outputpath/TableOne/B
 ./outputpath/IdxOne
 ./outputpath/IdxTwo.
 If anyone need it .I will build a clean tool.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PHOENIX-1056) A ImportTsv tool for phoenix to build table data and all index data.

2014-07-09 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14055974#comment-14055974
 ] 

James Taylor commented on PHOENIX-1056:
---

Thanks, [~jaywong]. That's a good improvement to build both the table data and 
the index data in a single job. Open issues are:
- Do we need both a CSV bulk loader and an ImportTsv tool? How are they 
different? Or can the improvements you made be folded into the CSV bulk loader 
instead? If we do need both, can the ImportTsv tool be built on top of the CSV 
bulk loader?
- The CSV bulk loader uses publicly exposed Phoenix APIs to get at the 
underlying KeyValues and uses the Phoenix table metadata to drive the import, 
while the ImportTSV tool requires the column information to be passed through 
in a somewhat awkward manner (leaving room for discrepancies between the real 
schema and the one passed in). The ImportTSV should go through the same Phoenix 
APIs as the CSV bulk loader IMO.

Thoughts? Would be interested in your opinions, [~gabriel.reid] and 
[~maghamravikiran]


 A ImportTsv tool for phoenix to build table data and all index data.
 

 Key: PHOENIX-1056
 URL: https://issues.apache.org/jira/browse/PHOENIX-1056
 Project: Phoenix
  Issue Type: Task
Affects Versions: 3.0.0
Reporter: jay wong
 Fix For: 3.1

 Attachments: PHOENIX-1056.patch


 I have just build a tool for build table data and index table data just like 
 ImportTsv job.
 http://hbase.apache.org/book/ops_mgt.html#importtsv
 when ImportTsv work it write HFile in a CF name path.
 for example A table has two cf, A and B.
 the output is 
 ./outputpath/A
 ./outputpath/B
 In my job. we has a table.  TableOne. and two Index IdxOne, IdxTwo.
 the output will be
 ./outputpath/TableOne/A
 ./outputpath/TableOne/B
 ./outputpath/IdxOne
 ./outputpath/IdxTwo.
 If anyone need it .I will build a clean tool.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PHOENIX-1056) A ImportTsv tool for phoenix to build table data and all index data.

2014-07-09 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056538#comment-14056538
 ] 

Jeffrey Zhong commented on PHOENIX-1056:


Other issues can be easily addressed except the index hfile region boundary 
alignment during MR otherwise LoadIncrementalHFiles will become a heavy 
operation.  

[~jaywong] Have you tried ImportTsv tool internally so that you might be able 
to see what's the performance difference between one single MR(plus loading 
hfiles) and multiple MR concurrently? 





 A ImportTsv tool for phoenix to build table data and all index data.
 

 Key: PHOENIX-1056
 URL: https://issues.apache.org/jira/browse/PHOENIX-1056
 Project: Phoenix
  Issue Type: Task
Affects Versions: 3.0.0
Reporter: jay wong
 Fix For: 3.1

 Attachments: PHOENIX-1056.patch


 I have just build a tool for build table data and index table data just like 
 ImportTsv job.
 http://hbase.apache.org/book/ops_mgt.html#importtsv
 when ImportTsv work it write HFile in a CF name path.
 for example A table has two cf, A and B.
 the output is 
 ./outputpath/A
 ./outputpath/B
 In my job. we has a table.  TableOne. and two Index IdxOne, IdxTwo.
 the output will be
 ./outputpath/TableOne/A
 ./outputpath/TableOne/B
 ./outputpath/IdxOne
 ./outputpath/IdxTwo.
 If anyone need it .I will build a clean tool.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PHOENIX-1056) A ImportTsv tool for phoenix to build table data and all index data.

2014-07-09 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14057040#comment-14057040
 ] 

Jeffrey Zhong commented on PHOENIX-1056:


{quote}
Sometimes build all the data in a single MR is not only for performance. but 
also the data consistency.
{quote}
But for bulk loading, I think we can safely make the assumption that input data 
won't change, no?

 A ImportTsv tool for phoenix to build table data and all index data.
 

 Key: PHOENIX-1056
 URL: https://issues.apache.org/jira/browse/PHOENIX-1056
 Project: Phoenix
  Issue Type: Task
Affects Versions: 3.0.0
Reporter: jay wong
 Fix For: 3.1

 Attachments: PHOENIX-1056.patch


 I have just build a tool for build table data and index table data just like 
 ImportTsv job.
 http://hbase.apache.org/book/ops_mgt.html#importtsv
 when ImportTsv work it write HFile in a CF name path.
 for example A table has two cf, A and B.
 the output is 
 ./outputpath/A
 ./outputpath/B
 In my job. we has a table.  TableOne. and two Index IdxOne, IdxTwo.
 the output will be
 ./outputpath/TableOne/A
 ./outputpath/TableOne/B
 ./outputpath/IdxOne
 ./outputpath/IdxTwo.
 If anyone need it .I will build a clean tool.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PHOENIX-1056) A ImportTsv tool for phoenix to build table data and all index data.

2014-07-08 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14055150#comment-14055150
 ] 

James Taylor commented on PHOENIX-1056:
---

What functionality does this add above and beyond our Bulk CSV loader? 
http://phoenix.apache.org/bulk_dataload.html

One thing may be the ability to import both data and indexes? It'd be a nice 
addition to the Bulk CSV loader to bulk load indexes together with their data.

 A ImportTsv tool for phoenix to build table data and all index data.
 

 Key: PHOENIX-1056
 URL: https://issues.apache.org/jira/browse/PHOENIX-1056
 Project: Phoenix
  Issue Type: Task
Affects Versions: 3.0.0
Reporter: jay wong
 Fix For: 3.1

 Attachments: PHOENIX-1056.patch


 I have just build a tool for build table data and index table data just like 
 ImportTsv job.
 http://hbase.apache.org/book/ops_mgt.html#importtsv
 when ImportTsv work it write HFile in a CF name path.
 for example A table has two cf, A and B.
 the output is 
 ./outputpath/A
 ./outputpath/B
 In my job. we has a table.  TableOne. and two Index IdxOne, IdxTwo.
 the output will be
 ./outputpath/TableOne/A
 ./outputpath/TableOne/B
 ./outputpath/IdxOne
 ./outputpath/IdxTwo.
 If anyone need it .I will build a clean tool.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PHOENIX-1056) A ImportTsv tool for phoenix to build table data and all index data.

2014-07-08 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14055387#comment-14055387
 ] 

Jeffrey Zhong commented on PHOENIX-1056:


Oh, I'm late to see the JIRA. I had a different patch to load index  table 
data in one go by submitting multiple MR jobs to load data concurrently for 
CsvBulkLoadTool.

[~jaywong] approach is using one MR to load data  index data in one single map 
reduce job. I checked the patch and the underlying idea is very good. But it 
has one issue is that the partitioning is on primary table. Therefore, the 
index table hfiles aren't align with its own partitioning and when loading 
those generated index hfiles will incur extra writes during loading.

Let me firstly create a separate JIRA to improve CsvBulkLoadTool to build 
indexes during loading time and later we can decide if to migrate 
CsvBulkLoadTool to use current JIRA's custom mapper, reducer and 
MultiHFileOutputFormat. 

 A ImportTsv tool for phoenix to build table data and all index data.
 

 Key: PHOENIX-1056
 URL: https://issues.apache.org/jira/browse/PHOENIX-1056
 Project: Phoenix
  Issue Type: Task
Affects Versions: 3.0.0
Reporter: jay wong
 Fix For: 3.1

 Attachments: PHOENIX-1056.patch


 I have just build a tool for build table data and index table data just like 
 ImportTsv job.
 http://hbase.apache.org/book/ops_mgt.html#importtsv
 when ImportTsv work it write HFile in a CF name path.
 for example A table has two cf, A and B.
 the output is 
 ./outputpath/A
 ./outputpath/B
 In my job. we has a table.  TableOne. and two Index IdxOne, IdxTwo.
 the output will be
 ./outputpath/TableOne/A
 ./outputpath/TableOne/B
 ./outputpath/IdxOne
 ./outputpath/IdxTwo.
 If anyone need it .I will build a clean tool.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PHOENIX-1056) A ImportTsv tool for phoenix to build table data and all index data.

2014-07-08 Thread jay wong (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14055809#comment-14055809
 ] 

jay wong commented on PHOENIX-1056:
---

[~jamestaylor]
yes . It create both table data and indexes data (HFile) in a single job.

The patch is a Alpha version. I build it for a preview.

Finally it's will be a part of CsvImportTsv.


 A ImportTsv tool for phoenix to build table data and all index data.
 

 Key: PHOENIX-1056
 URL: https://issues.apache.org/jira/browse/PHOENIX-1056
 Project: Phoenix
  Issue Type: Task
Affects Versions: 3.0.0
Reporter: jay wong
 Fix For: 3.1

 Attachments: PHOENIX-1056.patch


 I have just build a tool for build table data and index table data just like 
 ImportTsv job.
 http://hbase.apache.org/book/ops_mgt.html#importtsv
 when ImportTsv work it write HFile in a CF name path.
 for example A table has two cf, A and B.
 the output is 
 ./outputpath/A
 ./outputpath/B
 In my job. we has a table.  TableOne. and two Index IdxOne, IdxTwo.
 the output will be
 ./outputpath/TableOne/A
 ./outputpath/TableOne/B
 ./outputpath/IdxOne
 ./outputpath/IdxTwo.
 If anyone need it .I will build a clean tool.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PHOENIX-1056) A ImportTsv tool for phoenix to build table data and all index data.

2014-07-07 Thread jay wong (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14053505#comment-14053505
 ] 

jay wong commented on PHOENIX-1056:
---

nomarl HBase ImportTsv is :

bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv 
-Dimporttsv.columns=HBASE_ROW_KEY,cf:a,cf:b,cf:c  
-Dimporttsv.bulk.output=hdfs://storefile-outputdir tablename hdfs-inputdir

Phoenix ImportTsv is this. and it support phoenix datatype

bin/hbase org.apache.hadoop.hbase.mapreduce.PhoneixImportTsv 
-Dimporttsv.columns=HBASE_ROW_KEY,CF:A:PH_INT,CF:B:PH:BIGINT,cf:c 
-Dimporttsv.index.all=true  -Dimporttsv.bulk.output=hdfs://storefile-outputdir 
tablename hdfs-inputdir

If the primary key is mutil-col. support the rule 
replace HBASE_ROW_KEY to HBASE_ROW_KEY^CF1:Q1:PH_INT^CF2:Q2^0^CF1:Q3:PH_INT

parameter:
-Dimporttsv.index.all=true.  If build all index table data, default is false
-Dimporttsv.build.table=true if build the data table, default is true.
-Dimporttsv.index.names=INDEX1,INDEX2.  which index table we build. need set 
-Dimporttsv.index.all=false.






 A ImportTsv tool for phoenix to build table data and all index data.
 

 Key: PHOENIX-1056
 URL: https://issues.apache.org/jira/browse/PHOENIX-1056
 Project: Phoenix
  Issue Type: Task
Affects Versions: 3.0.0
Reporter: jay wong
 Fix For: 3.1

 Attachments: PHOENIX-1056.patch


 I have just build a tool for build table data and index table data just like 
 ImportTsv job.
 http://hbase.apache.org/book/ops_mgt.html#importtsv
 when ImportTsv work it write HFile in a CF name path.
 for example A table has two cf, A and B.
 the output is 
 ./outputpath/A
 ./outputpath/B
 In my job. we has a table.  TableOne. and two Index IdxOne, IdxTwo.
 the output will be
 ./outputpath/TableOne/A
 ./outputpath/TableOne/B
 ./outputpath/IdxOne
 ./outputpath/IdxTwo.
 If anyone need it .I will build a clean tool.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PHOENIX-1056) A ImportTsv tool for phoenix to build table data and all index data.

2014-07-07 Thread jay wong (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14053507#comment-14053507
 ] 

jay wong commented on PHOENIX-1056:
---

Anyway. In my ImportTsv

It support a char or unicode separator, which in the apache code only support a 
single-byte separators

just like 
  -Dimporttsv.separator=\001 
  -Dimporttsv.separator=\u0019

 A ImportTsv tool for phoenix to build table data and all index data.
 

 Key: PHOENIX-1056
 URL: https://issues.apache.org/jira/browse/PHOENIX-1056
 Project: Phoenix
  Issue Type: Task
Affects Versions: 3.0.0
Reporter: jay wong
 Fix For: 3.1

 Attachments: PHOENIX-1056.patch


 I have just build a tool for build table data and index table data just like 
 ImportTsv job.
 http://hbase.apache.org/book/ops_mgt.html#importtsv
 when ImportTsv work it write HFile in a CF name path.
 for example A table has two cf, A and B.
 the output is 
 ./outputpath/A
 ./outputpath/B
 In my job. we has a table.  TableOne. and two Index IdxOne, IdxTwo.
 the output will be
 ./outputpath/TableOne/A
 ./outputpath/TableOne/B
 ./outputpath/IdxOne
 ./outputpath/IdxTwo.
 If anyone need it .I will build a clean tool.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PHOENIX-1056) A ImportTsv tool for phoenix to build table data and all index data.

2014-07-04 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14052268#comment-14052268
 ] 

James Taylor commented on PHOENIX-1056:
---

Fantastic work, [~jaywong]! Would love to get this into Phoenix. Can you send 
us a pull request so folks can review it?

 A ImportTsv tool for phoenix to build table data and all index data.
 

 Key: PHOENIX-1056
 URL: https://issues.apache.org/jira/browse/PHOENIX-1056
 Project: Phoenix
  Issue Type: Task
Affects Versions: 3.0.0
Reporter: jay wong
 Fix For: 3.1


 I have just build a tool for build table data and index table data just like 
 ImportTsv job.
 http://hbase.apache.org/book/ops_mgt.html#importtsv
 when ImportTsv work it write HFile in a CF name path.
 for example A table has two cf, A and B.
 the output is 
 ./outputpath/A
 ./outputpath/B
 In my job. we has a table.  TableOne. and two Index IdxOne, IdxTwo.
 the output will be
 ./outputpath/TableOne/A
 ./outputpath/TableOne/B
 ./outputpath/IdxOne
 ./outputpath/IdxTwo.
 If anyone need it .I will build a clean tool.



--
This message was sent by Atlassian JIRA
(v6.2#6252)