[jira] [Resolved] (HIVE-17102) Example For Vectorized Execution in Hive in Cwiki not Seems to Work

2017-07-15 Thread anubhav tarar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anubhav tarar resolved HIVE-17102.
--
Resolution: Fixed

> Example For Vectorized Execution in Hive in Cwiki not Seems to Work
> ---
>
> Key: HIVE-17102
> URL: https://issues.apache.org/jira/browse/HIVE-17102
> Project: Hive
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.2.0
>Reporter: anubhav tarar
>Assignee: anubhav tarar
>
> i tried to do vectorized execution in hive by using hive cwiki but example do 
> not seems to work
> step1:created a orc table
> hive> create table Addresses (
> >   name string,
> >   street string,
> >   city string,
> >   state string,
> >   zip int
> > ) stored as orc tblproperties ("orc.compress"="NONE");
> step2:insert the values in table 
> hive> insert into Addresses values('anubhav','ggn','ggn','haryana','122001');
> Query ID = hduser_20170716093152_14774003-d2c4-4620-b773-ca17cafd902b
> Total jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks is set to 0 since there's no reduce operator
> Listening for transport dt_socket at address: 5005
> Job running in-process (local Hadoop)
> 2017-07-16 09:31:59,689 Stage-1 map = 100%,  reduce = 0%
> Ended Job = job_local1858411694_0004
> Stage-4 is selected by condition resolver.
> Stage-3 is filtered out by condition resolver.
> Stage-5 is filtered out by condition resolver.
> Moving data to: 
> hdfs://localhost:54310/user/hive/warehouse/addresses/.hive-staging_hive_2017-07-16_09-31-52_428_7861150459629073282-1/-ext-1
> Loading data to table default.addresses
> Table default.addresses stats: [numFiles=1, numRows=1, totalSize=713, 
> rawDataSize=360]
> MapReduce Jobs Launched: 
> Stage-Stage-1:  HDFS Read: 778 HDFS Write: 818 SUCCESS
> Total MapReduce CPU Time Spent: 0 msec
> step3:query the table with explain command
> hive> set hive.vectorized.execution.enabled = true;
> hive> explain select name from Addresses where zip>1;
> OK
> STAGE DEPENDENCIES:
>   Stage-0 is a root stage
> STAGE PLANS:
>   Stage: Stage-0
> Fetch Operator
>   limit: -1
>   Processor Tree:
> TableScan
>   alias: addresses
>   Statistics: Num rows: 1 Data size: 360 Basic stats: COMPLETE Column 
> stats: NONE
>   Filter Operator
> predicate: (zip > 1) (type: boolean)
> Statistics: Num rows: 1 Data size: 360 Basic stats: COMPLETE 
> Column stats: NONE
> Select Operator
>   expressions: name (type: string)
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 360 Basic stats: COMPLETE 
> Column stats: NONE
>   ListSink
> Time taken: 0.081 seconds, Fetched: 20 row(s)
> note:in explain command there is not vectorized reader applied
> reason for failiure is that When Fetch is used in the plan instead of Map, it 
> do not vectorize



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17102) Example For Vectorized Execution in Hive in Cwiki not Seems to Work

2017-07-15 Thread anubhav tarar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anubhav tarar updated HIVE-17102:
-
Description: 
i tried to do vectorized execution in hive by using hive cwiki but example do 
not seems to work

step1:created a orc table

hive> create table Addresses (
>   name string,
>   street string,
>   city string,
>   state string,
>   zip int
> ) stored as orc tblproperties ("orc.compress"="NONE");

step2:insert the values in table 

hive> insert into Addresses values('anubhav','ggn','ggn','haryana','122001');
Query ID = hduser_20170716093152_14774003-d2c4-4620-b773-ca17cafd902b
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Listening for transport dt_socket at address: 5005
Job running in-process (local Hadoop)
2017-07-16 09:31:59,689 Stage-1 map = 100%,  reduce = 0%
Ended Job = job_local1858411694_0004
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: 
hdfs://localhost:54310/user/hive/warehouse/addresses/.hive-staging_hive_2017-07-16_09-31-52_428_7861150459629073282-1/-ext-1
Loading data to table default.addresses
Table default.addresses stats: [numFiles=1, numRows=1, totalSize=713, 
rawDataSize=360]
MapReduce Jobs Launched: 
Stage-Stage-1:  HDFS Read: 778 HDFS Write: 818 SUCCESS
Total MapReduce CPU Time Spent: 0 msec

step3:query the table with explain command
hive> set hive.vectorized.execution.enabled = true;

hive> explain select name from Addresses where zip>1;
OK
STAGE DEPENDENCIES:
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-0
Fetch Operator
  limit: -1
  Processor Tree:
TableScan
  alias: addresses
  Statistics: Num rows: 1 Data size: 360 Basic stats: COMPLETE Column 
stats: NONE
  Filter Operator
predicate: (zip > 1) (type: boolean)
Statistics: Num rows: 1 Data size: 360 Basic stats: COMPLETE Column 
stats: NONE
Select Operator
  expressions: name (type: string)
  outputColumnNames: _col0
  Statistics: Num rows: 1 Data size: 360 Basic stats: COMPLETE 
Column stats: NONE
  ListSink

Time taken: 0.081 seconds, Fetched: 20 row(s)

note:in explain command there is not vectorized reader applied

reason for failiure is that When Fetch is used in the plan instead of Map, it 
do not vectorize



  was:
i tried to do vectorized execution in hive by using hive cwiki but example do 
not seems to work

step1:created a orc table

hive> create table Addresses (
>   name string,
>   street string,
>   city string,
>   state string,
>   zip int
> ) stored as orc tblproperties ("orc.compress"="NONE");

step2:insert the values in table 

hive> insert into Addresses values('anubhav','ggn','ggn','haryana','122001');
Query ID = hduser_20170716093152_14774003-d2c4-4620-b773-ca17cafd902b
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Listening for transport dt_socket at address: 5005
Job running in-process (local Hadoop)
2017-07-16 09:31:59,689 Stage-1 map = 100%,  reduce = 0%
Ended Job = job_local1858411694_0004
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: 
hdfs://localhost:54310/user/hive/warehouse/addresses/.hive-staging_hive_2017-07-16_09-31-52_428_7861150459629073282-1/-ext-1
Loading data to table default.addresses
Table default.addresses stats: [numFiles=1, numRows=1, totalSize=713, 
rawDataSize=360]
MapReduce Jobs Launched: 
Stage-Stage-1:  HDFS Read: 778 HDFS Write: 818 SUCCESS
Total MapReduce CPU Time Spent: 0 msec

step3:query the table with explain command
hive> set hive.vectorized.execution.enabled = true;

hive> explain select name from Addresses where zip>1;
OK
STAGE DEPENDENCIES:
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-0
Fetch Operator
  limit: -1
  Processor Tree:
TableScan
  alias: addresses
  Statistics: Num rows: 1 Data size: 360 Basic stats: COMPLETE Column 
stats: NONE
  Filter Operator
predicate: (zip > 1) (type: boolean)
Statistics: Num rows: 1 Data size: 360 Basic stats: COMPLETE Column 
stats: NONE
Select Operator
  expressions: name (type: string)
  outputColumnNames: _col0
  Statistics: Num rows: 1 Data size: 360 Basic stats: COMPLETE 
Column stats: NONE
  ListSink

Time taken: 0.081 seconds, Fetched: 20 row(s)

note:in explain command there is not vectorized reader applied


i updated hive cwiki for the same 

https://cwiki.apache.org/confluence/display/Hive/Vectorized+Query+Execution

> Example For Vectorized Execution in 

[jira] [Assigned] (HIVE-17102) Example For Vectorized Execution in Hive in Cwiki not Seems to Work

2017-07-15 Thread anubhav tarar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anubhav tarar reassigned HIVE-17102:



> Example For Vectorized Execution in Hive in Cwiki not Seems to Work
> ---
>
> Key: HIVE-17102
> URL: https://issues.apache.org/jira/browse/HIVE-17102
> Project: Hive
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.2.0
>Reporter: anubhav tarar
>Assignee: anubhav tarar
>
> i tried to do vectorized execution in hive by using hive cwiki but example do 
> not seems to work
> step1:created a orc table
> hive> create table Addresses (
> >   name string,
> >   street string,
> >   city string,
> >   state string,
> >   zip int
> > ) stored as orc tblproperties ("orc.compress"="NONE");
> step2:insert the values in table 
> hive> insert into Addresses values('anubhav','ggn','ggn','haryana','122001');
> Query ID = hduser_20170716093152_14774003-d2c4-4620-b773-ca17cafd902b
> Total jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks is set to 0 since there's no reduce operator
> Listening for transport dt_socket at address: 5005
> Job running in-process (local Hadoop)
> 2017-07-16 09:31:59,689 Stage-1 map = 100%,  reduce = 0%
> Ended Job = job_local1858411694_0004
> Stage-4 is selected by condition resolver.
> Stage-3 is filtered out by condition resolver.
> Stage-5 is filtered out by condition resolver.
> Moving data to: 
> hdfs://localhost:54310/user/hive/warehouse/addresses/.hive-staging_hive_2017-07-16_09-31-52_428_7861150459629073282-1/-ext-1
> Loading data to table default.addresses
> Table default.addresses stats: [numFiles=1, numRows=1, totalSize=713, 
> rawDataSize=360]
> MapReduce Jobs Launched: 
> Stage-Stage-1:  HDFS Read: 778 HDFS Write: 818 SUCCESS
> Total MapReduce CPU Time Spent: 0 msec
> step3:query the table with explain command
> hive> set hive.vectorized.execution.enabled = true;
> hive> explain select name from Addresses where zip>1;
> OK
> STAGE DEPENDENCIES:
>   Stage-0 is a root stage
> STAGE PLANS:
>   Stage: Stage-0
> Fetch Operator
>   limit: -1
>   Processor Tree:
> TableScan
>   alias: addresses
>   Statistics: Num rows: 1 Data size: 360 Basic stats: COMPLETE Column 
> stats: NONE
>   Filter Operator
> predicate: (zip > 1) (type: boolean)
> Statistics: Num rows: 1 Data size: 360 Basic stats: COMPLETE 
> Column stats: NONE
> Select Operator
>   expressions: name (type: string)
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 360 Basic stats: COMPLETE 
> Column stats: NONE
>   ListSink
> Time taken: 0.081 seconds, Fetched: 20 row(s)
> note:in explain command there is not vectorized reader applied



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16793) Scalar sub-query: sq_count_check not required if gby keys are constant

2017-07-15 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088801#comment-16088801
 ] 

Lefty Leverenz commented on HIVE-16793:
---

Thanks for the doc, [~vgarg] -- it looks good.

Question:  Should sq_count_check be documented along with the other UDFs, or is 
it for internal use only?  HIVE-15544 introduced it, so if sq_count_check needs 
to be documented we should update the doc note there.

> Scalar sub-query: sq_count_check not required if gby keys are constant
> --
>
> Key: HIVE-16793
> URL: https://issues.apache.org/jira/browse/HIVE-16793
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Vineet Garg
> Fix For: 3.0.0
>
> Attachments: HIVE-16793.1.patch, HIVE-16793.2.patch, 
> HIVE-16793.3.patch, HIVE-16793.4.patch, HIVE-16793.5.patch, HIVE-16793.6.patch
>
>
> This query has an sq_count check, though is useless on a constant key.
> {code}
> hive> explain select * from part where p_size > (select max(p_size) from part 
> where p_type = '1' group by p_type);
> Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross product
> Warning: Map Join MAPJOIN[36][bigTable=?] in task 'Map 1' is a cross product
> OK
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Reducer 4 (BROADCAST_EDGE), Reducer 6 (BROADCAST_EDGE)
> Reducer 3 <- Map 2 (SIMPLE_EDGE)
> Reducer 4 <- Reducer 3 (CUSTOM_SIMPLE_EDGE)
> Reducer 6 <- Map 5 (SIMPLE_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1 vectorized, llap
>   File Output Operator [FS_64]
> Select Operator [SEL_63] (rows= width=621)
>   
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
>   Filter Operator [FIL_62] (rows= width=625)
> predicate:(_col5 > _col10)
> Map Join Operator [MAPJOIN_61] (rows=2 width=625)
>   
> Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col10"]
> <-Reducer 6 [BROADCAST_EDGE] vectorized, llap
>   BROADCAST [RS_58]
> Select Operator [SEL_57] (rows=1 width=4)
>   Output:["_col0"]
>   Group By Operator [GBY_56] (rows=1 width=89)
> 
> Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0
>   <-Map 5 [SIMPLE_EDGE] vectorized, llap
> SHUFFLE [RS_55]
>   PartitionCols:_col0
>   Group By Operator [GBY_54] (rows=86 width=89)
> 
> Output:["_col0","_col1"],aggregations:["max(_col1)"],keys:'1'
> Select Operator [SEL_53] (rows=1212121 width=109)
>   Output:["_col1"]
>   Filter Operator [FIL_52] (rows=1212121 width=109)
> predicate:(p_type = '1')
> TableScan [TS_17] (rows=2 width=109)
>   
> tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"]
> <-Map Join Operator [MAPJOIN_60] (rows=2 width=621)
> 
> Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
>   <-Reducer 4 [BROADCAST_EDGE] vectorized, llap
> BROADCAST [RS_51]
>   Select Operator [SEL_50] (rows=1 width=8)
> Filter Operator [FIL_49] (rows=1 width=8)
>   predicate:(sq_count_check(_col0) <= 1)
>   Group By Operator [GBY_48] (rows=1 width=8)
> Output:["_col0"],aggregations:["count(VALUE._col0)"]
>   <-Reducer 3 [CUSTOM_SIMPLE_EDGE] vectorized, llap
> PARTITION_ONLY_SHUFFLE [RS_47]
>   Group By Operator [GBY_46] (rows=1 width=8)
> Output:["_col0"],aggregations:["count()"]
> Select Operator [SEL_45] (rows=1 width=85)
>   Group By Operator [GBY_44] (rows=1 width=85)
> Output:["_col0"],keys:KEY._col0
>   <-Map 2 [SIMPLE_EDGE] vectorized, llap
> SHUFFLE [RS_43]
>   PartitionCols:_col0
>   Group By Operator [GBY_42] (rows=83 
> width=85)
> Output:["_col0"],keys:'1'
> Select Operator [SEL_41] (rows=1212121 
> width=105)
>   Filter Operator 

[jira] [Commented] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.

2017-07-15 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088785#comment-16088785
 ] 

Lefty Leverenz commented on HIVE-4577:
--

Doc note:  This should be documented in two wikidocs that describe the dfs 
command:

* [Commands | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Commands]
* [CLI -- Hive Interactive Shell Commands | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli#LanguageManualCli-HiveInteractiveShellCommands]

Added a TODOC3.0 label.  (A TODOC2.2.0 label should also be added if the patch 
gets committed to branch-2.2.)

> hive CLI can't handle hadoop dfs command  with space and quotes.
> 
>
> Key: HIVE-4577
> URL: https://issues.apache.org/jira/browse/HIVE-4577
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.9.0, 0.10.0, 0.14.0, 0.13.1, 1.2.0, 1.1.0
>Reporter: Bing Li
>Assignee: Bing Li
>  Labels: TODOC3.0
> Fix For: 2.2.0, 3.0.0
>
> Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, 
> HIVE-4577.3.patch.txt, HIVE-4577.4.patch, HIVE-4577.5.patch, HIVE-4577.6.patch
>
>
> As design, hive could support hadoop dfs command in hive shell, like 
> hive> dfs -mkdir /user/biadmin/mydir;
> but has different behavior with hadoop if the path contains space and quotes
> hive> dfs -mkdir "hello"; 
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:40 
> /user/biadmin/"hello"
> hive> dfs -mkdir 'world';
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:43 
> /user/biadmin/'world'
> hive> dfs -mkdir "bei jing";
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/"bei
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/jing"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.

2017-07-15 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-4577:
-
Labels: TODOC3.0  (was: )

> hive CLI can't handle hadoop dfs command  with space and quotes.
> 
>
> Key: HIVE-4577
> URL: https://issues.apache.org/jira/browse/HIVE-4577
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.9.0, 0.10.0, 0.14.0, 0.13.1, 1.2.0, 1.1.0
>Reporter: Bing Li
>Assignee: Bing Li
>  Labels: TODOC3.0
> Fix For: 2.2.0, 3.0.0
>
> Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, 
> HIVE-4577.3.patch.txt, HIVE-4577.4.patch, HIVE-4577.5.patch, HIVE-4577.6.patch
>
>
> As design, hive could support hadoop dfs command in hive shell, like 
> hive> dfs -mkdir /user/biadmin/mydir;
> but has different behavior with hadoop if the path contains space and quotes
> hive> dfs -mkdir "hello"; 
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:40 
> /user/biadmin/"hello"
> hive> dfs -mkdir 'world';
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:43 
> /user/biadmin/'world'
> hive> dfs -mkdir "bei jing";
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/"bei
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/jing"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.

2017-07-15 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088781#comment-16088781
 ] 

Lefty Leverenz commented on HIVE-4577:
--

[~vgumashta], is this also going to be committed to branch-2.2?  If not, the 
fix version should only show 3.0.0.  Thanks.

> hive CLI can't handle hadoop dfs command  with space and quotes.
> 
>
> Key: HIVE-4577
> URL: https://issues.apache.org/jira/browse/HIVE-4577
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.9.0, 0.10.0, 0.14.0, 0.13.1, 1.2.0, 1.1.0
>Reporter: Bing Li
>Assignee: Bing Li
> Fix For: 2.2.0, 3.0.0
>
> Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, 
> HIVE-4577.3.patch.txt, HIVE-4577.4.patch, HIVE-4577.5.patch, HIVE-4577.6.patch
>
>
> As design, hive could support hadoop dfs command in hive shell, like 
> hive> dfs -mkdir /user/biadmin/mydir;
> but has different behavior with hadoop if the path contains space and quotes
> hive> dfs -mkdir "hello"; 
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:40 
> /user/biadmin/"hello"
> hive> dfs -mkdir 'world';
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:43 
> /user/biadmin/'world'
> hive> dfs -mkdir "bei jing";
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/"bei
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/jing"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16996) Add HLL as an alternative to FM sketch to compute stats

2017-07-15 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088775#comment-16088775
 ] 

Lefty Leverenz commented on HIVE-16996:
---

Doc note:  This adds *hive.stats.ndv.algo* to HiveConf.java, so it needs to be 
documented in the Statistics section of Configuration Properties.

* [Configuration Properties -- Statistics | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-Statistics]

Added a TODOC3.0 label.

> Add HLL as an alternative to FM sketch to compute stats
> ---
>
> Key: HIVE-16996
> URL: https://issues.apache.org/jira/browse/HIVE-16996
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>  Labels: TODOC3.0
> Fix For: 3.0.0
>
> Attachments: Accuracy and performance comparison between HyperLogLog 
> and FM Sketch.docx, HIVE-16966.01.patch, HIVE-16966.02.patch, 
> HIVE-16966.03.patch, HIVE-16966.04.patch, HIVE-16966.05.patch, 
> HIVE-16966.06.patch, HIVE-16966.07.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16996) Add HLL as an alternative to FM sketch to compute stats

2017-07-15 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-16996:
--
Labels: TODOC3.0  (was: )

> Add HLL as an alternative to FM sketch to compute stats
> ---
>
> Key: HIVE-16996
> URL: https://issues.apache.org/jira/browse/HIVE-16996
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>  Labels: TODOC3.0
> Fix For: 3.0.0
>
> Attachments: Accuracy and performance comparison between HyperLogLog 
> and FM Sketch.docx, HIVE-16966.01.patch, HIVE-16966.02.patch, 
> HIVE-16966.03.patch, HIVE-16966.04.patch, HIVE-16966.05.patch, 
> HIVE-16966.06.patch, HIVE-16966.07.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16997) Extend object store to store bit vectors

2017-07-15 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088762#comment-16088762
 ] 

Pengcheng Xiong commented on HIVE-16997:


after transfering bit vector in FMsketch to 4bytes, we need 1024*4=4196bytes 
for FM sketch.

> Extend object store to store bit vectors
> 
>
> Key: HIVE-16997
> URL: https://issues.apache.org/jira/browse/HIVE-16997
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17018) Small table is converted to map join even the total size of small tables exceeds the threshold(hive.auto.convert.join.noconditionaltask.size)

2017-07-15 Thread Carter Shanklin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088721#comment-16088721
 ] 

Carter Shanklin commented on HIVE-17018:


My inputs:

That particular variable can't be renamed to something spark specific since all 
engines use it
Adding a net new variable for Spark would increase confusion rather than 
decrease it.

It would be good to have some sort of descriptive name that applies to both Tez 
and MR. As pointed out there is no relation between what that variable used to 
do and what it does today, and the implication of changing that parameter is 
difficult to guess.

Maybe a new variable like hive.auto.convert.join.max.hashtable.size could be 
introduced. Both engines switch to that variable at some point, then usage of 
the old variable could be deprecated and then removed.

Just my inputs. /cc [~ashutoshc]

> Small table is converted to map join even the total size of small tables 
> exceeds the threshold(hive.auto.convert.join.noconditionaltask.size)
> -
>
> Key: HIVE-17018
> URL: https://issues.apache.org/jira/browse/HIVE-17018
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Attachments: HIVE-17018_data_init.q, HIVE-17018.q, t3.txt
>
>
>  we use "hive.auto.convert.join.noconditionaltask.size" as the threshold. it 
> means  the sum of size for n-1 of the tables/partitions for a n-way join is 
> smaller than it, it will be converted to a map join. for example, A join B 
> join C join D join E. Big table is A(100M), small tables are 
> B(10M),C(10M),D(10M),E(10M).  If we set 
> hive.auto.convert.join.noconditionaltask.size=20M. In current code, E,D,B 
> will be converted to map join but C will not be converted to map join. In my 
> understanding, because hive.auto.convert.join.noconditionaltask.size can only 
> contain E and D, so C and B should not be converted to map join.  
> Let's explain more why E can be converted to map join.
> in current code, 
> [SparkMapJoinOptimizer#getConnectedMapJoinSize|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#L364]
>  calculates all the mapjoins  in the parent path and child path. The search 
> stops when encountering [UnionOperator or 
> ReduceOperator|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#L381].
>  Because C is not converted to map join because {{connectedMapJoinSize + 
> totalSize) > maxSize}} [see 
> code|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#L330].The
>  RS before the join of C remains. When calculating whether B will be 
> converted to map join, {{getConnectedMapJoinSize}} returns 0 as encountering 
> [RS 
> |https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#409]
>  and causes  {{connectedMapJoinSize + totalSize) < maxSize}} matches.
> [~xuefuz] or [~jxiang]: can you help see whether this is a bug or not  as you 
> are more familiar with SparkJoinOptimizer.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16996) Add HLL as an alternative to FM sketch to compute stats

2017-07-15 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088685#comment-16088685
 ] 

Pengcheng Xiong commented on HIVE-16996:


updated related q file changes, pushed to master. thanks [~ashutoshc] and 
[~prasanth_j] for the review.

> Add HLL as an alternative to FM sketch to compute stats
> ---
>
> Key: HIVE-16996
> URL: https://issues.apache.org/jira/browse/HIVE-16996
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 3.0.0
>
> Attachments: Accuracy and performance comparison between HyperLogLog 
> and FM Sketch.docx, HIVE-16966.01.patch, HIVE-16966.02.patch, 
> HIVE-16966.03.patch, HIVE-16966.04.patch, HIVE-16966.05.patch, 
> HIVE-16966.06.patch, HIVE-16966.07.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16996) Add HLL as an alternative to FM sketch to compute stats

2017-07-15 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16996:
---
Fix Version/s: 3.0.0

> Add HLL as an alternative to FM sketch to compute stats
> ---
>
> Key: HIVE-16996
> URL: https://issues.apache.org/jira/browse/HIVE-16996
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 3.0.0
>
> Attachments: Accuracy and performance comparison between HyperLogLog 
> and FM Sketch.docx, HIVE-16966.01.patch, HIVE-16966.02.patch, 
> HIVE-16966.03.patch, HIVE-16966.04.patch, HIVE-16966.05.patch, 
> HIVE-16966.06.patch, HIVE-16966.07.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16996) Add HLL as an alternative to FM sketch to compute stats

2017-07-15 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16996:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Add HLL as an alternative to FM sketch to compute stats
> ---
>
> Key: HIVE-16996
> URL: https://issues.apache.org/jira/browse/HIVE-16996
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 3.0.0
>
> Attachments: Accuracy and performance comparison between HyperLogLog 
> and FM Sketch.docx, HIVE-16966.01.patch, HIVE-16966.02.patch, 
> HIVE-16966.03.patch, HIVE-16966.04.patch, HIVE-16966.05.patch, 
> HIVE-16966.06.patch, HIVE-16966.07.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-12631) LLAP: support ORC ACID tables

2017-07-15 Thread Teddy Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088596#comment-16088596
 ] 

Teddy Choi commented on HIVE-12631:
---

It looks like there's no failing test that is caused by this issue anymore.

> LLAP: support ORC ACID tables
> -
>
> Key: HIVE-12631
> URL: https://issues.apache.org/jira/browse/HIVE-12631
> Project: Hive
>  Issue Type: Bug
>  Components: llap, Transactions
>Reporter: Sergey Shelukhin
>Assignee: Teddy Choi
> Attachments: HIVE-12631.10.patch, HIVE-12631.10.patch, 
> HIVE-12631.11.patch, HIVE-12631.11.patch, HIVE-12631.12.patch, 
> HIVE-12631.13.patch, HIVE-12631.15.patch, HIVE-12631.16.patch, 
> HIVE-12631.17.patch, HIVE-12631.18.patch, HIVE-12631.19.patch, 
> HIVE-12631.1.patch, HIVE-12631.20.patch, HIVE-12631.21.patch, 
> HIVE-12631.22.patch, HIVE-12631.23.patch, HIVE-12631.2.patch, 
> HIVE-12631.3.patch, HIVE-12631.4.patch, HIVE-12631.5.patch, 
> HIVE-12631.6.patch, HIVE-12631.7.patch, HIVE-12631.8.patch, 
> HIVE-12631.8.patch, HIVE-12631.9.patch
>
>
> LLAP uses a completely separate read path in ORC to allow for caching and 
> parallelization of reads and processing. This path does not support ACID. As 
> far as I remember ACID logic is embedded inside ORC format; we need to 
> refactor it to be on top of some interface, if practical; or just port it to 
> LLAP read path.
> Another consideration is how the logic will work with cache. The cache is 
> currently low-level (CB-level in ORC), so we could just use it to read bases 
> and deltas (deltas should be cached with higher priority) and merge as usual. 
> We could also cache merged representation in future.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16990) REPL LOAD should update last repl ID only after successful copy of data files.

2017-07-15 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088570#comment-16088570
 ] 

Hive QA commented on HIVE-16990:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12877445/HIVE-16990.02.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 11067 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=167)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_2]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_explainuser_1]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_use_op_stats]
 (batchId=167)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_use_ts_stats_for_mapjoin]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=167)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=178)
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testHttpRetryOnServerIdleTimeout 
(batchId=227)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6050/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6050/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6050/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 11 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12877445 - PreCommit-HIVE-Build

> REPL LOAD should update last repl ID only after successful copy of data files.
> --
>
> Key: HIVE-16990
> URL: https://issues.apache.org/jira/browse/HIVE-16990
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-16990.01.patch, HIVE-16990.02.patch
>
>
> For REPL LOAD operations that includes both metadata and data changes should 
> follow the below rule.
> 1. Copy the metadata excluding the last repl ID.
> 2. Copy the data files
> 3. If Step 1 and 2 are successful, then update the last repl ID of the object.
> This rule will allow the the failed events to be re-applied by REPL LOAD and 
> ensures no data loss due to failures.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17097) Fix SemiJoinHint parsing in SemanticAnalyzer

2017-07-15 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-17097:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Thanks [~gopalv], [~djaiswal].  Committed to master. 

> Fix SemiJoinHint parsing in SemanticAnalyzer
> 
>
> Key: HIVE-17097
> URL: https://issues.apache.org/jira/browse/HIVE-17097
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HIVE-17097.1.patch, HIVE-17097.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-12631) LLAP: support ORC ACID tables

2017-07-15 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088551#comment-16088551
 ] 

Hive QA commented on HIVE-12631:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12877443/HIVE-12631.23.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 11055 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=238)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=167)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_2]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_explainuser_1]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_use_op_stats]
 (batchId=167)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_use_ts_stats_for_mapjoin]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=167)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=233)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=108)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=178)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6049/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6049/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6049/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 15 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12877443 - PreCommit-HIVE-Build

> LLAP: support ORC ACID tables
> -
>
> Key: HIVE-12631
> URL: https://issues.apache.org/jira/browse/HIVE-12631
> Project: Hive
>  Issue Type: Bug
>  Components: llap, Transactions
>Reporter: Sergey Shelukhin
>Assignee: Teddy Choi
> Attachments: HIVE-12631.10.patch, HIVE-12631.10.patch, 
> HIVE-12631.11.patch, HIVE-12631.11.patch, HIVE-12631.12.patch, 
> HIVE-12631.13.patch, HIVE-12631.15.patch, HIVE-12631.16.patch, 
> HIVE-12631.17.patch, HIVE-12631.18.patch, HIVE-12631.19.patch, 
> HIVE-12631.1.patch, HIVE-12631.20.patch, HIVE-12631.21.patch, 
> HIVE-12631.22.patch, HIVE-12631.23.patch, HIVE-12631.2.patch, 
> HIVE-12631.3.patch, HIVE-12631.4.patch, HIVE-12631.5.patch, 
> HIVE-12631.6.patch, HIVE-12631.7.patch, HIVE-12631.8.patch, 
> HIVE-12631.8.patch, HIVE-12631.9.patch
>
>
> LLAP uses a completely separate read path in ORC to allow for caching and 
> parallelization of reads and processing. This path does not support ACID. As 
> far as I remember ACID logic is embedded inside ORC format; we need to 
> refactor it to be on top of some interface, if practical; or just port it to 
> LLAP read path.
> Another consideration is how the logic will work with cache. The cache is 
> currently low-level (CB-level in ORC), so we could just use it to read bases 
> and deltas (deltas should be cached with higher priority) and merge as usual. 
> We could also cache merged representation in future.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16990) REPL LOAD should update last repl ID only after successful copy of data files.

2017-07-15 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-16990:

Attachment: HIVE-16990.02.patch

Added 02.patch with fix for the pre-commit test failures.


> REPL LOAD should update last repl ID only after successful copy of data files.
> --
>
> Key: HIVE-16990
> URL: https://issues.apache.org/jira/browse/HIVE-16990
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-16990.01.patch, HIVE-16990.02.patch
>
>
> For REPL LOAD operations that includes both metadata and data changes should 
> follow the below rule.
> 1. Copy the metadata excluding the last repl ID.
> 2. Copy the data files
> 3. If Step 1 and 2 are successful, then update the last repl ID of the object.
> This rule will allow the the failed events to be re-applied by REPL LOAD and 
> ensures no data loss due to failures.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16990) REPL LOAD should update last repl ID only after successful copy of data files.

2017-07-15 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-16990:

Status: Patch Available  (was: Open)

> REPL LOAD should update last repl ID only after successful copy of data files.
> --
>
> Key: HIVE-16990
> URL: https://issues.apache.org/jira/browse/HIVE-16990
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-16990.01.patch, HIVE-16990.02.patch
>
>
> For REPL LOAD operations that includes both metadata and data changes should 
> follow the below rule.
> 1. Copy the metadata excluding the last repl ID.
> 2. Copy the data files
> 3. If Step 1 and 2 are successful, then update the last repl ID of the object.
> This rule will allow the the failed events to be re-applied by REPL LOAD and 
> ensures no data loss due to failures.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16990) REPL LOAD should update last repl ID only after successful copy of data files.

2017-07-15 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-16990:

Status: Open  (was: Patch Available)

> REPL LOAD should update last repl ID only after successful copy of data files.
> --
>
> Key: HIVE-16990
> URL: https://issues.apache.org/jira/browse/HIVE-16990
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-16990.01.patch
>
>
> For REPL LOAD operations that includes both metadata and data changes should 
> follow the below rule.
> 1. Copy the metadata excluding the last repl ID.
> 2. Copy the data files
> 3. If Step 1 and 2 are successful, then update the last repl ID of the object.
> This rule will allow the the failed events to be re-applied by REPL LOAD and 
> ensures no data loss due to failures.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-12631) LLAP: support ORC ACID tables

2017-07-15 Thread Teddy Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-12631:
--
Attachment: HIVE-12631.23.patch

> LLAP: support ORC ACID tables
> -
>
> Key: HIVE-12631
> URL: https://issues.apache.org/jira/browse/HIVE-12631
> Project: Hive
>  Issue Type: Bug
>  Components: llap, Transactions
>Reporter: Sergey Shelukhin
>Assignee: Teddy Choi
> Attachments: HIVE-12631.10.patch, HIVE-12631.10.patch, 
> HIVE-12631.11.patch, HIVE-12631.11.patch, HIVE-12631.12.patch, 
> HIVE-12631.13.patch, HIVE-12631.15.patch, HIVE-12631.16.patch, 
> HIVE-12631.17.patch, HIVE-12631.18.patch, HIVE-12631.19.patch, 
> HIVE-12631.1.patch, HIVE-12631.20.patch, HIVE-12631.21.patch, 
> HIVE-12631.22.patch, HIVE-12631.23.patch, HIVE-12631.2.patch, 
> HIVE-12631.3.patch, HIVE-12631.4.patch, HIVE-12631.5.patch, 
> HIVE-12631.6.patch, HIVE-12631.7.patch, HIVE-12631.8.patch, 
> HIVE-12631.8.patch, HIVE-12631.9.patch
>
>
> LLAP uses a completely separate read path in ORC to allow for caching and 
> parallelization of reads and processing. This path does not support ACID. As 
> far as I remember ACID logic is embedded inside ORC format; we need to 
> refactor it to be on top of some interface, if practical; or just port it to 
> LLAP read path.
> Another consideration is how the logic will work with cache. The cache is 
> currently low-level (CB-level in ORC), so we could just use it to read bases 
> and deltas (deltas should be cached with higher priority) and merge as usual. 
> We could also cache merged representation in future.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)