[jira] [Commented] (HIVE-9660) store end offset of compressed data for RG in RowIndex in ORC

2016-05-23 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296784#comment-15296784
 ] 

Sergey Shelukhin commented on HIVE-9660:


[~owen.omalley] lots of ORC tests failed that may be related... also it looks 
like all the Tez tests got stuck, not sure if that's related or just HiveQA 
(they didn't get stuck in other jiras though)

> store end offset of compressed data for RG in RowIndex in ORC
> -
>
> Key: HIVE-9660
> URL: https://issues.apache.org/jira/browse/HIVE-9660
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-9660.01.patch, HIVE-9660.02.patch, 
> HIVE-9660.03.patch, HIVE-9660.04.patch, HIVE-9660.05.patch, 
> HIVE-9660.06.patch, HIVE-9660.07.patch, HIVE-9660.07.patch, 
> HIVE-9660.08.patch, HIVE-9660.09.patch, HIVE-9660.10.patch, 
> HIVE-9660.10.patch, HIVE-9660.11.patch, HIVE-9660.patch, HIVE-9660.patch, 
> HIVE-9660.patch, owen-hive-9660.patch
>
>
> Right now the end offset is estimated, which in some cases results in tons of 
> extra data being read.
> We can add a separate array to RowIndex (positions_v2?) that stores number of 
> compressed buffers for each RG, or end offset, or something, to remove this 
> estimation magic



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9660) store end offset of compressed data for RG in RowIndex in ORC

2016-05-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15295469#comment-15295469
 ] 

Hive QA commented on HIVE-9660:
---



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12805265/HIVE-9660.patch

{color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 119 failed/errored test(s), 9855 tests 
executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
TestMiniTezCliDriver-constprog_dpp.q-dynamic_partition_pruning.q-vectorization_10.q-and-12-more
 - did not produce a TEST-*.xml file
TestMiniTezCliDriver-cte_4.q-vector_non_string_partition.q-delete_where_non_partitioned.q-and-12-more
 - did not produce a TEST-*.xml file
TestMiniTezCliDriver-dynpart_sort_optimization2.q-tez_dynpart_hashjoin_3.q-orc_vectorization_ppd.q-and-12-more
 - did not produce a TEST-*.xml file
TestMiniTezCliDriver-explainuser_4.q-update_after_multiple_inserts.q-mapreduce2.q-and-12-more
 - did not produce a TEST-*.xml file
TestMiniTezCliDriver-mapjoin_mapjoin.q-insert_into1.q-vector_decimal_2.q-and-12-more
 - did not produce a TEST-*.xml file
TestMiniTezCliDriver-order_null.q-vector_acid3.q-orc_merge10.q-and-12-more - 
did not produce a TEST-*.xml file
TestMiniTezCliDriver-smb_cache.q-transform_ppr2.q-vector_outer_join0.q-and-5-more
 - did not produce a TEST-*.xml file
TestMiniTezCliDriver-vector_coalesce.q-cbo_windowing.q-tez_join.q-and-12-more - 
did not produce a TEST-*.xml file
TestMiniTezCliDriver-vectorization_16.q-vector_decimal_round.q-orc_merge6.q-and-12-more
 - did not produce a TEST-*.xml file
TestSparkCliDriver-auto_join30.q-join2.q-input17.q-and-12-more - did not 
produce a TEST-*.xml file
TestSparkCliDriver-avro_joins.q-join36.q-join1.q-and-12-more - did not produce 
a TEST-*.xml file
TestSparkCliDriver-groupby2.q-custom_input_output_format.q-join41.q-and-12-more 
- did not produce a TEST-*.xml file
TestSparkCliDriver-groupby3_map.q-skewjoinopt8.q-union_remove_1.q-and-12-more - 
did not produce a TEST-*.xml file
TestSparkCliDriver-load_dyn_part5.q-load_dyn_part2.q-skewjoinopt16.q-and-12-more
 - did not produce a TEST-*.xml file
TestSparkCliDriver-order.q-auto_join18_multi_distinct.q-union2.q-and-12-more - 
did not produce a TEST-*.xml file
TestSparkCliDriver-skewjoin_noskew.q-sample2.q-skewjoinopt10.q-and-12-more - 
did not produce a TEST-*.xml file
TestSparkCliDriver-vector_distinct_2.q-join15.q-load_dyn_part3.q-and-12-more - 
did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_orc
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_stats_orc
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_part
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_columnStatsUpdateForStatsOptimizer_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_deleteAnalyze
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_optimization2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_full
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_partial
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_partial_ndv
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_values_orig_table_use_metadata
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ivyDownload
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_llap_uncompressed
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_analyze
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_file_dump
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_merge10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_merge11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_merge12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_predicate_pushdown
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_schema_evol_orc_nonvec_fetchwork_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_schema_evol_orc_nonvec_mapwork_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_schema_evol_orc_vec_mapwork_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_schema_evol_stats
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_smb_mapjoin_11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_fast_stats
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_ptf
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_queries

[jira] [Commented] (HIVE-9660) store end offset of compressed data for RG in RowIndex in ORC

2016-05-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292347#comment-15292347
 ] 

ASF GitHub Bot commented on HIVE-9660:
--

GitHub user omalley opened a pull request:

https://github.com/apache/hive/pull/77

HIVE-9660 Add length to ORC indexes so that the reader knows how much to 
read.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/omalley/hive hive-9660

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/77.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #77


commit 014e9aaec1cb8f7257b997e953e6cc30d34a71cf
Author: Owen O'Malley 
Date:   2016-03-26T02:39:12Z

HIVE-11417. Move the ReaderImpl and RowReaderImpl to the ORC module,
by making shims for the row by row reader.

commit afda4610a8c1ed9fe3adc86c6fc1b08b5fdae7aa
Author: Owen O'Malley 
Date:   2016-05-13T21:44:34Z

HIVE-9660 Add length to ORC indexes so that the reader knows how much
to read.




> store end offset of compressed data for RG in RowIndex in ORC
> -
>
> Key: HIVE-9660
> URL: https://issues.apache.org/jira/browse/HIVE-9660
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-9660.01.patch, HIVE-9660.02.patch, 
> HIVE-9660.03.patch, HIVE-9660.04.patch, HIVE-9660.05.patch, 
> HIVE-9660.06.patch, HIVE-9660.07.patch, HIVE-9660.07.patch, 
> HIVE-9660.08.patch, HIVE-9660.09.patch, HIVE-9660.10.patch, 
> HIVE-9660.10.patch, HIVE-9660.11.patch, HIVE-9660.patch, HIVE-9660.patch, 
> owen-hive-9660.patch
>
>
> Right now the end offset is estimated, which in some cases results in tons of 
> extra data being read.
> We can add a separate array to RowIndex (positions_v2?) that stores number of 
> compressed buffers for each RG, or end offset, or something, to remove this 
> estimation magic



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9660) store end offset of compressed data for RG in RowIndex in ORC

2016-05-02 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267781#comment-15267781
 ] 

Sergey Shelukhin commented on HIVE-9660:


Hmm. I see, the main difference is that one could track the finished RGs and 
record the length at the end based on stream position, instead of tracking all 
the length changes attributed to the RG while it's active... this will change 
the set-of-active-rgs to set-of-just-finished-rgs (of which there can still be 
several per CB, or RL block), and move tracking logic around to different 
places. The dictionary stuff will still have to be there because the 
direct/dictionary flush each write streams that are separated into RGs out of 
sync with the main writer (data+length for direct, data for dictionary). I am 
not sure if it's worth it at this point... I could change the existing patch to 
do that, or do it in separate JIRA later. If you want to do it from scratch 
that also works ;)

> store end offset of compressed data for RG in RowIndex in ORC
> -
>
> Key: HIVE-9660
> URL: https://issues.apache.org/jira/browse/HIVE-9660
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-9660.01.patch, HIVE-9660.02.patch, 
> HIVE-9660.03.patch, HIVE-9660.04.patch, HIVE-9660.05.patch, 
> HIVE-9660.06.patch, HIVE-9660.07.patch, HIVE-9660.07.patch, 
> HIVE-9660.08.patch, HIVE-9660.09.patch, HIVE-9660.10.patch, 
> HIVE-9660.10.patch, HIVE-9660.11.patch, HIVE-9660.patch, HIVE-9660.patch
>
>
> Right now the end offset is estimated, which in some cases results in tons of 
> extra data being read.
> We can add a separate array to RowIndex (positions_v2?) that stores number of 
> compressed buffers for each RG, or end offset, or something, to remove this 
> estimation magic



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9660) store end offset of compressed data for RG in RowIndex in ORC

2016-05-02 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267223#comment-15267223
 ] 

Owen O'Malley commented on HIVE-9660:
-

Callbacks are only added when a row group finishes, which is the only time that 
anyone cares. So nothing happens per a row, only at the row group boundary. The 
flow looks like:

start of stripe:
* record position

end of row group:
* create callbacks for each data stream (not the index or dictionary streams)
* record the position for the next row group

end of stripe:
* flush everything to ensure all the callbacks happen

nothing happens per a row or rowbatch.


> store end offset of compressed data for RG in RowIndex in ORC
> -
>
> Key: HIVE-9660
> URL: https://issues.apache.org/jira/browse/HIVE-9660
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-9660.01.patch, HIVE-9660.02.patch, 
> HIVE-9660.03.patch, HIVE-9660.04.patch, HIVE-9660.05.patch, 
> HIVE-9660.06.patch, HIVE-9660.07.patch, HIVE-9660.07.patch, 
> HIVE-9660.08.patch, HIVE-9660.09.patch, HIVE-9660.10.patch, 
> HIVE-9660.10.patch, HIVE-9660.11.patch, HIVE-9660.patch, HIVE-9660.patch
>
>
> Right now the end offset is estimated, which in some cases results in tons of 
> extra data being read.
> We can add a separate array to RowIndex (positions_v2?) that stores number of 
> compressed buffers for each RG, or end offset, or something, to remove this 
> estimation magic



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9660) store end offset of compressed data for RG in RowIndex in ORC

2016-05-02 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267179#comment-15267179
 ] 

Sergey Shelukhin commented on HIVE-9660:


{noformat}
The run length encoder doesn't perform the callback, but when its RLE block is 
finished passes the same callback to the OutStream for when the OutStream 
finishes the next compression block. Thus it is easy to guarantee that you only 
get called back when compression block finishes after the RLE finishes, which 
is the required condition. Obviously, for cases where there isn't an RLE, it 
just puts the callback directly on the OutStream and it works exactly the same 
way.
{noformat}
RG can have several RLE blocks; RLE block can contain several RGs. Moreover, in 
case of a boolean writer, there are two levels of buffering - the byte, and the 
RLE buffer in the underlying byte writer.

There's also the issue of dictionaries and strings, where isPresent is written 
normally but the entries cannot be finalized.
In general, I feel like all the coordination complexity will still be 
necessary, it would just end up moving around a bit.

> store end offset of compressed data for RG in RowIndex in ORC
> -
>
> Key: HIVE-9660
> URL: https://issues.apache.org/jira/browse/HIVE-9660
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-9660.01.patch, HIVE-9660.02.patch, 
> HIVE-9660.03.patch, HIVE-9660.04.patch, HIVE-9660.05.patch, 
> HIVE-9660.06.patch, HIVE-9660.07.patch, HIVE-9660.07.patch, 
> HIVE-9660.08.patch, HIVE-9660.09.patch, HIVE-9660.10.patch, 
> HIVE-9660.10.patch, HIVE-9660.11.patch, HIVE-9660.patch, HIVE-9660.patch
>
>
> Right now the end offset is estimated, which in some cases results in tons of 
> extra data being read.
> We can add a separate array to RowIndex (positions_v2?) that stores number of 
> compressed buffers for each RG, or end offset, or something, to remove this 
> estimation magic



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9660) store end offset of compressed data for RG in RowIndex in ORC

2016-05-02 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267118#comment-15267118
 ] 

Owen O'Malley commented on HIVE-9660:
-

{quote}
Note that the run length blocks finish before CBs (ie RL first, then CB 
containing the RL), so the callbacks are actually reversed.
{quote}

They can happen in *either* order, but the length must be computed when the 
compression block finishes AFTER the rle block finishes.

{quote}
For uncompressed, the main concern is that for exact boundaries, there will be 
too many calls.
{quote}

I don't understand this sentence. There will be a call per stream per a row 
group, that is hardly a problem.

{quote}
You'd need to pass a callback per RG down to the RL writer (and in some cases 
there isn't even an RL writer, like double), but RL writer won't know when a RG 
ends. 
{quote}

The run length encoder doesn't perform the callback, but when its RLE block is 
finished passes the same callback to the OutStream for when the OutStream 
finishes the next compression block. Thus it is easy to guarantee that you only 
get called back when compression block finishes after the RLE finishes, which 
is the required condition. Obviously, for cases where there isn't an RLE, it 
just puts the callback directly on the OutStream and it works exactly the same 
way.


> store end offset of compressed data for RG in RowIndex in ORC
> -
>
> Key: HIVE-9660
> URL: https://issues.apache.org/jira/browse/HIVE-9660
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-9660.01.patch, HIVE-9660.02.patch, 
> HIVE-9660.03.patch, HIVE-9660.04.patch, HIVE-9660.05.patch, 
> HIVE-9660.06.patch, HIVE-9660.07.patch, HIVE-9660.07.patch, 
> HIVE-9660.08.patch, HIVE-9660.09.patch, HIVE-9660.10.patch, 
> HIVE-9660.10.patch, HIVE-9660.11.patch, HIVE-9660.patch, HIVE-9660.patch
>
>
> Right now the end offset is estimated, which in some cases results in tons of 
> extra data being read.
> We can add a separate array to RowIndex (positions_v2?) that stores number of 
> compressed buffers for each RG, or end offset, or something, to remove this 
> estimation magic



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9660) store end offset of compressed data for RG in RowIndex in ORC

2016-05-02 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267073#comment-15267073
 ] 

Sergey Shelukhin commented on HIVE-9660:


Btw, I just realized it would actually not even be cleaner, if I understand it 
correctly. You'd need to pass a callback per RG down to the RL writer (and in 
some cases there isn't even an RL writer, like double), but RL writer won't 
know when a RG ends. So you;d need to tell the RL writer which callback-RGs are 
done, and then when RL block ends they can send those down. That seems like a 
roundabout way of doing it instead of just coordinating  in one place.

> store end offset of compressed data for RG in RowIndex in ORC
> -
>
> Key: HIVE-9660
> URL: https://issues.apache.org/jira/browse/HIVE-9660
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-9660.01.patch, HIVE-9660.02.patch, 
> HIVE-9660.03.patch, HIVE-9660.04.patch, HIVE-9660.05.patch, 
> HIVE-9660.06.patch, HIVE-9660.07.patch, HIVE-9660.07.patch, 
> HIVE-9660.08.patch, HIVE-9660.09.patch, HIVE-9660.10.patch, 
> HIVE-9660.10.patch, HIVE-9660.11.patch, HIVE-9660.patch, HIVE-9660.patch
>
>
> Right now the end offset is estimated, which in some cases results in tons of 
> extra data being read.
> We can add a separate array to RowIndex (positions_v2?) that stores number of 
> compressed buffers for each RG, or end offset, or something, to remove this 
> estimation magic



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9660) store end offset of compressed data for RG in RowIndex in ORC

2016-05-02 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266769#comment-15266769
 ] 

Owen O'Malley commented on HIVE-9660:
-

After looking at this patch, I feel like we can do it more cleanly. I'd propose 
that we:
* add a capability to register callbacks on PositionedOutputStream that get 
called immediately if there are no uncompressed bytes, or after the next 
compression block finishes.
* add a similar capability to the run length encoders that wait until the end 
of the current run and then pass the callback down to the 
PositionedOutputStream.
* the ORC WriterImpl then creates callbacks that finalize the RowIndexEntry 
when all of the streams for that column have completed their run length 
encoding block and compression block.

This makes most of the column types really straightforward. The only one that 
is a mess is the string column types because of the delayed writing caused by 
the dictionary.

I should have a first draft of such a patch today for everyone to look at.

Thoughts?

> store end offset of compressed data for RG in RowIndex in ORC
> -
>
> Key: HIVE-9660
> URL: https://issues.apache.org/jira/browse/HIVE-9660
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-9660.01.patch, HIVE-9660.02.patch, 
> HIVE-9660.03.patch, HIVE-9660.04.patch, HIVE-9660.05.patch, 
> HIVE-9660.06.patch, HIVE-9660.07.patch, HIVE-9660.07.patch, 
> HIVE-9660.08.patch, HIVE-9660.09.patch, HIVE-9660.10.patch, 
> HIVE-9660.10.patch, HIVE-9660.11.patch, HIVE-9660.patch, HIVE-9660.patch
>
>
> Right now the end offset is estimated, which in some cases results in tons of 
> extra data being read.
> We can add a separate array to RowIndex (positions_v2?) that stores number of 
> compressed buffers for each RG, or end offset, or something, to remove this 
> estimation magic



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9660) store end offset of compressed data for RG in RowIndex in ORC

2016-04-29 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15264416#comment-15264416
 ] 

Sergey Shelukhin commented on HIVE-9660:


[~prasanth_j] this is now ready for +1 :)

> store end offset of compressed data for RG in RowIndex in ORC
> -
>
> Key: HIVE-9660
> URL: https://issues.apache.org/jira/browse/HIVE-9660
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-9660.01.patch, HIVE-9660.02.patch, 
> HIVE-9660.03.patch, HIVE-9660.04.patch, HIVE-9660.05.patch, 
> HIVE-9660.06.patch, HIVE-9660.07.patch, HIVE-9660.07.patch, 
> HIVE-9660.08.patch, HIVE-9660.09.patch, HIVE-9660.10.patch, 
> HIVE-9660.10.patch, HIVE-9660.11.patch, HIVE-9660.patch, HIVE-9660.patch
>
>
> Right now the end offset is estimated, which in some cases results in tons of 
> extra data being read.
> We can add a separate array to RowIndex (positions_v2?) that stores number of 
> compressed buffers for each RG, or end offset, or something, to remove this 
> estimation magic



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9660) store end offset of compressed data for RG in RowIndex in ORC

2016-04-29 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15264415#comment-15264415
 ] 

Sergey Shelukhin commented on HIVE-9660:


Some test failures are caused by metastore issues, and some are broken by other 
jiras it appears

> store end offset of compressed data for RG in RowIndex in ORC
> -
>
> Key: HIVE-9660
> URL: https://issues.apache.org/jira/browse/HIVE-9660
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-9660.01.patch, HIVE-9660.02.patch, 
> HIVE-9660.03.patch, HIVE-9660.04.patch, HIVE-9660.05.patch, 
> HIVE-9660.06.patch, HIVE-9660.07.patch, HIVE-9660.07.patch, 
> HIVE-9660.08.patch, HIVE-9660.09.patch, HIVE-9660.10.patch, 
> HIVE-9660.10.patch, HIVE-9660.11.patch, HIVE-9660.patch, HIVE-9660.patch
>
>
> Right now the end offset is estimated, which in some cases results in tons of 
> extra data being read.
> We can add a separate array to RowIndex (positions_v2?) that stores number of 
> compressed buffers for each RG, or end offset, or something, to remove this 
> estimation magic



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9660) store end offset of compressed data for RG in RowIndex in ORC

2016-04-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15263918#comment-15263918
 ] 

Hive QA commented on HIVE-9660:
---



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12800915/HIVE-9660.11.patch

{color:green}SUCCESS:{color} +1 due to 12 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 53 failed/errored test(s), 9894 tests 
executed
*Failed tests:*
{noformat}
TestHBaseAggrStatsCacheIntegration - did not produce a TEST-*.xml file
TestHWISessionManager - did not produce a TEST-*.xml file
TestMiniTezCliDriver-auto_join30.q-script_pipe.q-vector_decimal_10_0.q-and-12-more
 - did not produce a TEST-*.xml file
TestMiniTezCliDriver-explainuser_4.q-update_after_multiple_inserts.q-mapreduce2.q-and-12-more
 - did not produce a TEST-*.xml file
TestMiniTezCliDriver-order_null.q-vector_acid3.q-orc_merge10.q-and-12-more - 
did not produce a TEST-*.xml file
TestMiniTezCliDriver-vector_distinct_2.q-tez_joins_explain.q-cte_mat_1.q-and-12-more
 - did not produce a TEST-*.xml file
TestMiniTezCliDriver-vector_varchar_4.q-smb_cache.q-tez_join_hash.q-and-8-more 
- did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_partial_ndv
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nomore_ambiguous_table_col
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_regexp_extract
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket4
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket6
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_disable_merge_for_bucketing
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_num_buckets
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_orc_merge9
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join1
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join2
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join3
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join4
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join5
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_clustern3
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_clustern4
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_nonkey_groupby
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_selectDistinctStarNeg_2
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_subquery_shared_alias
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_udtf_not_supported1
org.apache.hadoop.hive.metastore.TestAuthzApiEmbedAuthorizerInRemote.org.apache.hadoop.hive.metastore.TestAuthzApiEmbedAuthorizerInRemote
org.apache.hadoop.hive.metastore.TestFilterHooks.org.apache.hadoop.hive.metastore.TestFilterHooks
org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testAddPartitions
org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testFetchingPartitionsWithDifferentSchemas
org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testGetPartitionSpecs_WithAndWithoutPartitionGrouping
org.apache.hadoop.hive.metastore.TestMetaStoreEndFunctionListener.testEndFunctionListener
org.apache.hadoop.hive.metastore.TestMetaStoreEventListenerOnlyOnCommit.testEventStatus
org.apache.hadoop.hive.metastore.TestMetaStoreInitListener.testMetaStoreInitListener
org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.org.apache.hadoop.hive.metastore.TestMetaStoreMetrics
org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAppendPartitionWithCommas
org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAppendPartitionWithValidCharacters
org.apache.hadoop.hive.metastore.TestRetryingHMSHandler.testRetryingHMSHandler
org.apache.hadoop.hive.ql.security.TestClientSideAuthorizationProvider.testSimplePrivileges
org.apache.hadoop.hive.ql.security.TestExtendedAcls.org.apache.hadoop.hive.ql.security.TestExtendedAcls
org.apache.hadoop.hive.ql.security.TestFolderPermissions.org.apache.hadoop.hive.ql.security.TestFolderPermissions
org.apache.hadoop.hive.ql.security.TestMultiAuthorizationPreEventListener.org.apache.hadoop.hive.ql.security.TestMultiAuthorizationPreEventListener
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropDatabase
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropPartition
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProvider.testSimplePrivileges

[jira] [Commented] (HIVE-9660) store end offset of compressed data for RG in RowIndex in ORC

2016-04-25 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15257127#comment-15257127
 ] 

Sergey Shelukhin commented on HIVE-9660:


That is pretty much it. There are some more detailed descriptions in the 
comments. The two complex bits are the integer writers that have their separate 
caches, so one needs to be aware when accounting for a CB that, even though 
some RGs might be fully written, their values could still be in the integer 
writer literals array (or a similar place), and not in this CB. 
Another is the string writer, which is logically simple (we save index entries 
as before, only this time we have to make sure when writing stuff out that we 
maintain a correct set of active RGs for those CB callbacks), but a little bit 
involved code-wise.

I'll look at test failures, I think the last patch was supposed to pass all the 
tests before rebase, probably some stupid error.

> store end offset of compressed data for RG in RowIndex in ORC
> -
>
> Key: HIVE-9660
> URL: https://issues.apache.org/jira/browse/HIVE-9660
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-9660.01.patch, HIVE-9660.02.patch, 
> HIVE-9660.03.patch, HIVE-9660.04.patch, HIVE-9660.05.patch, 
> HIVE-9660.06.patch, HIVE-9660.07.patch, HIVE-9660.07.patch, 
> HIVE-9660.08.patch, HIVE-9660.09.patch, HIVE-9660.10.patch, 
> HIVE-9660.10.patch, HIVE-9660.patch, HIVE-9660.patch
>
>
> Right now the end offset is estimated, which in some cases results in tons of 
> extra data being read.
> We can add a separate array to RowIndex (positions_v2?) that stores number of 
> compressed buffers for each RG, or end offset, or something, to remove this 
> estimation magic



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9660) store end offset of compressed data for RG in RowIndex in ORC

2016-04-25 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15256560#comment-15256560
 ] 

Owen O'Malley commented on HIVE-9660:
-

I guess my assumption was that you would make a callback from the underlying 
stream and when a compression buffer finished, you would record a length for 
any pending RG.

> store end offset of compressed data for RG in RowIndex in ORC
> -
>
> Key: HIVE-9660
> URL: https://issues.apache.org/jira/browse/HIVE-9660
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-9660.01.patch, HIVE-9660.02.patch, 
> HIVE-9660.03.patch, HIVE-9660.04.patch, HIVE-9660.05.patch, 
> HIVE-9660.06.patch, HIVE-9660.07.patch, HIVE-9660.07.patch, 
> HIVE-9660.08.patch, HIVE-9660.09.patch, HIVE-9660.10.patch, 
> HIVE-9660.10.patch, HIVE-9660.patch, HIVE-9660.patch
>
>
> Right now the end offset is estimated, which in some cases results in tons of 
> extra data being read.
> We can add a separate array to RowIndex (positions_v2?) that stores number of 
> compressed buffers for each RG, or end offset, or something, to remove this 
> estimation magic



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9660) store end offset of compressed data for RG in RowIndex in ORC

2016-04-25 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15256542#comment-15256542
 ] 

Owen O'Malley commented on HIVE-9660:
-

I don't think we need to bump up the writer version for this change, because 
the reader can tell if the protobuf has the field or not. WriterVersions are 
typically reserved for bugs in the writer where the reader needs to work around 
bugs.

Can you give a top level view on how you are approaching adding the lengths?


> store end offset of compressed data for RG in RowIndex in ORC
> -
>
> Key: HIVE-9660
> URL: https://issues.apache.org/jira/browse/HIVE-9660
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-9660.01.patch, HIVE-9660.02.patch, 
> HIVE-9660.03.patch, HIVE-9660.04.patch, HIVE-9660.05.patch, 
> HIVE-9660.06.patch, HIVE-9660.07.patch, HIVE-9660.07.patch, 
> HIVE-9660.08.patch, HIVE-9660.09.patch, HIVE-9660.10.patch, 
> HIVE-9660.10.patch, HIVE-9660.patch, HIVE-9660.patch
>
>
> Right now the end offset is estimated, which in some cases results in tons of 
> extra data being read.
> We can add a separate array to RowIndex (positions_v2?) that stores number of 
> compressed buffers for each RG, or end offset, or something, to remove this 
> estimation magic



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9660) store end offset of compressed data for RG in RowIndex in ORC

2016-04-24 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15255649#comment-15255649
 ] 

Hive QA commented on HIVE-9660:
---



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12800288/HIVE-9660.10.patch

{color:green}SUCCESS:{color} +1 due to 12 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 73 failed/errored test(s), 9947 tests 
executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_file_dump
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_lengths
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_merge10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_merge11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_merge12
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket5
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_map_operators
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_reducers_power_two
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_list_bucket_dml_10
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_orc_merge1
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_orc_merge2
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_orc_merge_diff_fs
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_reduce_deduplicate
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join1
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join2
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join3
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join4
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join5
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge10
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge11
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge12
org.apache.hadoop.hive.metastore.TestAuthzApiEmbedAuthorizerInRemote.org.apache.hadoop.hive.metastore.TestAuthzApiEmbedAuthorizerInRemote
org.apache.hadoop.hive.metastore.TestFilterHooks.org.apache.hadoop.hive.metastore.TestFilterHooks
org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testAddPartitions
org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testFetchingPartitionsWithDifferentSchemas
org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testGetPartitionSpecs_WithAndWithoutPartitionGrouping
org.apache.hadoop.hive.metastore.TestMetaStoreEndFunctionListener.testEndFunctionListener
org.apache.hadoop.hive.metastore.TestMetaStoreEventListenerOnlyOnCommit.testEventStatus
org.apache.hadoop.hive.metastore.TestMetaStoreInitListener.testMetaStoreInitListener
org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.org.apache.hadoop.hive.metastore.TestMetaStoreMetrics
org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAddPartitionWithValidPartVal
org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAppendPartitionWithCommas
org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAppendPartitionWithUnicode
org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAppendPartitionWithValidCharacters
org.apache.hadoop.hive.metastore.TestRetryingHMSHandler.testRetryingHMSHandler
org.apache.hadoop.hive.ql.TestTxnCommands2.testBucketizedInputFormat
org.apache.hadoop.hive.ql.TestTxnCommands2.testDeleteIn
org.apache.hadoop.hive.ql.TestTxnCommands2.testInitiatorWithMultipleFailedCompactions
org.apache.hadoop.hive.ql.TestTxnCommands2.testOrcNoPPD
org.apache.hadoop.hive.ql.TestTxnCommands2.testOrcPPD
org.apache.hadoop.hive.ql.TestTxnCommands2.testUpdateMixedCase
org.apache.hadoop.hive.ql.io.orc.TestColumnStatistics.testHasNull
org.apache.hadoop.hive.ql.io.orc.TestFileDump.testBloomFilter
org.apache.hadoop.hive.ql.io.orc.TestFileDump.testBloomFilter2
org.apache.hadoop.hive.ql.io.orc.TestFileDump.testDictionaryThreshold
org.apache.hadoop.hive.ql.io.orc.TestFileDump.testDump
org.apache.hadoop.hive.ql.io.orc.TestJsonFileDump.testJsonDump
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.lockConflictDbTable
org.apache.hadoop.hive.ql.security.TestClientSideAuthorizationProvider.testSimplePrivileges
org.apache.hadoop.hive.ql.security.TestExtendedAcls.org.apache.hadoop.hive.ql.security.TestExtendedAcls
org.apache.hadoop.hive.ql.security.TestFolderPermissions.org.apache.hadoop.hive.ql.security.TestFolderPermissions

[jira] [Commented] (HIVE-9660) store end offset of compressed data for RG in RowIndex in ORC

2016-04-14 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15240703#comment-15240703
 ] 

Hive QA commented on HIVE-9660:
---



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12798085/HIVE-9660.09.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7583/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7583/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7583/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-7583/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ cd apache-github-source-source
+ git fetch origin
>From https://github.com/apache/hive
   529580f..0dd4621  master -> origin/master
+ git reset --hard HEAD
HEAD is now at 529580f HIVE-13486: Cast the column type for column masking 
(Pengcheng Xiong, reviewed by Ashutosh Chauhan)
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is behind 'origin/master' by 1 commit, and can be fast-forwarded.
+ git reset --hard origin/master
HEAD is now at 0dd4621 HIVE-12159: Create vectorized readers for the complex 
types (Owen O'Malley, reviewed by Matt McCline)
+ git merge --ff-only origin/master
Already up-to-date.
+ git gc
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12798085 - PreCommit-HIVE-TRUNK-Build

> store end offset of compressed data for RG in RowIndex in ORC
> -
>
> Key: HIVE-9660
> URL: https://issues.apache.org/jira/browse/HIVE-9660
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-9660.01.patch, HIVE-9660.02.patch, 
> HIVE-9660.03.patch, HIVE-9660.04.patch, HIVE-9660.05.patch, 
> HIVE-9660.06.patch, HIVE-9660.07.patch, HIVE-9660.07.patch, 
> HIVE-9660.08.patch, HIVE-9660.09.patch, HIVE-9660.patch, HIVE-9660.patch
>
>
> Right now the end offset is estimated, which in some cases results in tons of 
> extra data being read.
> We can add a separate array to RowIndex (positions_v2?) that stores number of 
> compressed buffers for each RG, or end offset, or something, to remove this 
> estimation magic



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9660) store end offset of compressed data for RG in RowIndex in ORC

2016-04-11 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15236360#comment-15236360
 ] 

Prasanth Jayachandran commented on HIVE-9660:
-

Left some more comments in RB. Will it be easy to add unit tests for these?

> store end offset of compressed data for RG in RowIndex in ORC
> -
>
> Key: HIVE-9660
> URL: https://issues.apache.org/jira/browse/HIVE-9660
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-9660.01.patch, HIVE-9660.02.patch, 
> HIVE-9660.03.patch, HIVE-9660.04.patch, HIVE-9660.05.patch, 
> HIVE-9660.06.patch, HIVE-9660.07.patch, HIVE-9660.07.patch, 
> HIVE-9660.08.patch, HIVE-9660.09.patch, HIVE-9660.patch, HIVE-9660.patch
>
>
> Right now the end offset is estimated, which in some cases results in tons of 
> extra data being read.
> We can add a separate array to RowIndex (positions_v2?) that stores number of 
> compressed buffers for each RG, or end offset, or something, to remove this 
> estimation magic



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9660) store end offset of compressed data for RG in RowIndex in ORC

2016-04-11 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15236003#comment-15236003
 ] 

Sergey Shelukhin commented on HIVE-9660:


Test failures don't look related, typical recent metastore timeouts.

> store end offset of compressed data for RG in RowIndex in ORC
> -
>
> Key: HIVE-9660
> URL: https://issues.apache.org/jira/browse/HIVE-9660
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-9660.01.patch, HIVE-9660.02.patch, 
> HIVE-9660.03.patch, HIVE-9660.04.patch, HIVE-9660.05.patch, 
> HIVE-9660.06.patch, HIVE-9660.07.patch, HIVE-9660.07.patch, 
> HIVE-9660.08.patch, HIVE-9660.09.patch, HIVE-9660.patch, HIVE-9660.patch
>
>
> Right now the end offset is estimated, which in some cases results in tons of 
> extra data being read.
> We can add a separate array to RowIndex (positions_v2?) that stores number of 
> compressed buffers for each RG, or end offset, or something, to remove this 
> estimation magic



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9660) store end offset of compressed data for RG in RowIndex in ORC

2016-04-11 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15235977#comment-15235977
 ] 

Hive QA commented on HIVE-9660:
---



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12797846/HIVE-9660.08.patch

{color:green}SUCCESS:{color} +1 due to 11 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 54 failed/errored test(s), 9885 tests 
executed
*Failed tests:*
{noformat}
TestMiniTezCliDriver-schema_evol_orc_acidvec_mapwork_part.q-vector_partitioned_date_time.q-vector_non_string_partition.q-and-12-more
 - did not produce a TEST-*.xml file
TestMiniTezCliDriver-schema_evol_text_nonvec_mapwork_table.q-vector_left_outer_join2.q-vector_outer_join5.q-and-12-more
 - did not produce a TEST-*.xml file
TestMiniTezCliDriver-vector_acid3.q-vector_decimal_trailing.q-lvj_mapjoin.q-and-12-more
 - did not produce a TEST-*.xml file
TestMiniTezCliDriver-vector_partition_diff_num_cols.q-vectorization_10.q-orc_merge9.q-and-12-more
 - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ivyDownload
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket5
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_map_operators
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_reducers_power_two
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_list_bucket_dml_10
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_orc_merge1
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_orc_merge2
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_orc_merge_diff_fs
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_reduce_deduplicate
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join1
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join2
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join3
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join4
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join5
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.org.apache.hadoop.hive.cli.TestMiniTezCliDriver
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_insert_values_tmp_table
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_schema_evol_orc_nonvec_fetchwork_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_dyn_part_max
org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testNegativeCliDriver_minimr_broken_pipe
org.apache.hadoop.hive.metastore.TestAuthzApiEmbedAuthorizerInRemote.org.apache.hadoop.hive.metastore.TestAuthzApiEmbedAuthorizerInRemote
org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testAddPartitions
org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testFetchingPartitionsWithDifferentSchemas
org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testGetPartitionSpecs_WithAndWithoutPartitionGrouping
org.apache.hadoop.hive.metastore.hbase.TestHBaseImport.org.apache.hadoop.hive.metastore.hbase.TestHBaseImport
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.concurrencyFalse
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testDDLExclusive
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testDelete
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testLockTimeout
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testSingleReadPartition
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testSingleWriteTable
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testUpdate
org.apache.hadoop.hive.ql.security.TestClientSideAuthorizationProvider.testSimplePrivileges
org.apache.hadoop.hive.ql.security.TestExtendedAcls.org.apache.hadoop.hive.ql.security.TestExtendedAcls
org.apache.hadoop.hive.ql.security.TestFolderPermissions.org.apache.hadoop.hive.ql.security.TestFolderPermissions
org.apache.hadoop.hive.ql.security.TestMetastoreAuthorizationProvider.testSimplePrivileges
org.apache.hadoop.hive.ql.security.TestMultiAuthorizationPreEventListener.org.apache.hadoop.hive.ql.security.TestMultiAuthorizationPreEventListener
org.apache.hadoop.hive.ql.security.TestStorageBasedClientSideAuthorizationProvider.testSimplePrivileges
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropDatabase
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropPartition
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProvider.testSimplePrivileges
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProviderWithACL.testSimplePrivileges

[jira] [Commented] (HIVE-9660) store end offset of compressed data for RG in RowIndex in ORC

2016-04-08 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15233018#comment-15233018
 ] 

Sergey Shelukhin commented on HIVE-9660:


Hmm, looks like recent refactoring broke a bunch of stuff. I will take a look.

> store end offset of compressed data for RG in RowIndex in ORC
> -
>
> Key: HIVE-9660
> URL: https://issues.apache.org/jira/browse/HIVE-9660
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-9660.01.patch, HIVE-9660.02.patch, 
> HIVE-9660.03.patch, HIVE-9660.04.patch, HIVE-9660.05.patch, 
> HIVE-9660.06.patch, HIVE-9660.07.patch, HIVE-9660.07.patch, HIVE-9660.patch, 
> HIVE-9660.patch
>
>
> Right now the end offset is estimated, which in some cases results in tons of 
> extra data being read.
> We can add a separate array to RowIndex (positions_v2?) that stores number of 
> compressed buffers for each RG, or end offset, or something, to remove this 
> estimation magic



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9660) store end offset of compressed data for RG in RowIndex in ORC

2016-04-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232951#comment-15232951
 ] 

Hive QA commented on HIVE-9660:
---



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12797641/HIVE-9660.07.patch

{color:green}SUCCESS:{color} +1 due to 11 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 172 failed/errored test(s), 9771 tests 
executed
*Failed tests:*
{noformat}
TestMiniSparkOnYarnCliDriver - did not produce a TEST-*.xml file
TestSparkCliDriver-auto_join30.q-vector_data_types.q-scriptfile1.q-and-12-more 
- did not produce a TEST-*.xml file
TestSparkCliDriver-auto_join9.q-bucketmapjoin11.q-smb_mapjoin_2.q-and-12-more - 
did not produce a TEST-*.xml file
TestSparkCliDriver-date_udf.q-join23.q-auto_join4.q-and-12-more - did not 
produce a TEST-*.xml file
TestSparkCliDriver-groupby4.q-timestamp_null.q-auto_join23.q-and-12-more - did 
not produce a TEST-*.xml file
TestSparkCliDriver-ppd_gby_join.q-groupby_rollup1.q-auto_sortmerge_join_4.q-and-12-more
 - did not produce a TEST-*.xml file
TestSparkCliDriver-ppd_join3.q-union26.q-load_dyn_part15.q-and-12-more - did 
not produce a TEST-*.xml file
TestSparkCliDriver-stats13.q-groupby6_map.q-join_casesensitive.q-and-12-more - 
did not produce a TEST-*.xml file
TestSparkCliDriver-vector_distinct_2.q-input17.q-load_dyn_part2.q-and-12-more - 
did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_select
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_orig_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_values_non_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_values_orig_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_values_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_file_dump
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_int_type_promotion
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_lengths
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_llap
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_merge8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_merge9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_min_max
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_predicate_pushdown
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_fast_stats
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_update_all_types
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_update_orig_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_aggregate_9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_binary_join_groupby
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_cast_constant
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_char_4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_data_types
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_distinct_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_groupby_3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_interval_mapjoin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_number_compare_projection
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_orderby_5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_outer_join1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_outer_join2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_outer_join3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_outer_join4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_outer_join5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_reduce1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_reduce2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_reduce3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_string_concat
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_varchar_4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_part
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_part_varchar
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_orc_ppd_basic
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_insert_orig_table
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_insert_values_non_partitioned
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge8
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge9
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_ppd_basic

[jira] [Commented] (HIVE-9660) store end offset of compressed data for RG in RowIndex in ORC

2016-04-08 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232946#comment-15232946
 ] 

Prasanth Jayachandran commented on HIVE-9660:
-

I still don't think we need a config for writer. I can see that the config is 
added to avoid writing wrong lengths or disable that feature. But the problem 
is that the we won't be able to identify the files that are already written 
wrongly. So I would recommend bumping up the writerVersion to reflect this jira 
(HIVE-9660). With this we can identify files that are written after HIVE-9660. 
In future if we find anything wrong, we bump up the writerVersion again and 
make reader resilient by ignoring lengths from files written with HIVE-9660. 
There should also be a reader config that use lengths when available or 
fallback to old codepath.

> store end offset of compressed data for RG in RowIndex in ORC
> -
>
> Key: HIVE-9660
> URL: https://issues.apache.org/jira/browse/HIVE-9660
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-9660.01.patch, HIVE-9660.02.patch, 
> HIVE-9660.03.patch, HIVE-9660.04.patch, HIVE-9660.05.patch, 
> HIVE-9660.06.patch, HIVE-9660.07.patch, HIVE-9660.07.patch, HIVE-9660.patch, 
> HIVE-9660.patch
>
>
> Right now the end offset is estimated, which in some cases results in tons of 
> extra data being read.
> We can add a separate array to RowIndex (positions_v2?) that stores number of 
> compressed buffers for each RG, or end offset, or something, to remove this 
> estimation magic



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9660) store end offset of compressed data for RG in RowIndex in ORC

2016-04-08 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232625#comment-15232625
 ] 

Sergey Shelukhin commented on HIVE-9660:


It doesn't look like RB is working correctly. I cannot get the patch to 
display. Recent patch may need to be reviewed by applying and diff-ing 2 
branches locally..

> store end offset of compressed data for RG in RowIndex in ORC
> -
>
> Key: HIVE-9660
> URL: https://issues.apache.org/jira/browse/HIVE-9660
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-9660.01.patch, HIVE-9660.02.patch, 
> HIVE-9660.03.patch, HIVE-9660.04.patch, HIVE-9660.05.patch, 
> HIVE-9660.06.patch, HIVE-9660.07.patch, HIVE-9660.07.patch, HIVE-9660.patch, 
> HIVE-9660.patch
>
>
> Right now the end offset is estimated, which in some cases results in tons of 
> extra data being read.
> We can add a separate array to RowIndex (positions_v2?) that stores number of 
> compressed buffers for each RG, or end offset, or something, to remove this 
> estimation magic



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9660) store end offset of compressed data for RG in RowIndex in ORC

2016-04-06 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229332#comment-15229332
 ] 

Prasanth Jayachandran commented on HIVE-9660:
-

Posted some comments in RB. I will have to do another pass to better understand 
things in clear mind :). 

> store end offset of compressed data for RG in RowIndex in ORC
> -
>
> Key: HIVE-9660
> URL: https://issues.apache.org/jira/browse/HIVE-9660
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-9660.01.patch, HIVE-9660.02.patch, 
> HIVE-9660.03.patch, HIVE-9660.04.patch, HIVE-9660.05.patch, HIVE-9660.patch, 
> HIVE-9660.patch
>
>
> Right now the end offset is estimated, which in some cases results in tons of 
> extra data being read.
> We can add a separate array to RowIndex (positions_v2?) that stores number of 
> compressed buffers for each RG, or end offset, or something, to remove this 
> estimation magic



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9660) store end offset of compressed data for RG in RowIndex in ORC

2016-04-06 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228695#comment-15228695
 ] 

Hive QA commented on HIVE-9660:
---



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12797148/HIVE-9660.04.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7490/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7490/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7490/

Messages:
{noformat}
 This message was trimmed, see log for full details 
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory 
/data/hive-ptest/working/apache-github-source-source/shims/aggregator/src/main/resources
[INFO] Copying 3 resources
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ hive-shims ---
[INFO] Executing tasks

main:
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ hive-shims ---
[INFO] No sources to compile
[INFO] 
[INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ 
hive-shims ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory 
/data/hive-ptest/working/apache-github-source-source/shims/aggregator/src/test/resources
[INFO] Copying 3 resources
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ hive-shims ---
[INFO] Executing tasks

main:
[mkdir] Created dir: 
/data/hive-ptest/working/apache-github-source-source/shims/aggregator/target/tmp
[mkdir] Created dir: 
/data/hive-ptest/working/apache-github-source-source/shims/aggregator/target/warehouse
[mkdir] Created dir: 
/data/hive-ptest/working/apache-github-source-source/shims/aggregator/target/tmp/conf
 [copy] Copying 15 files to 
/data/hive-ptest/working/apache-github-source-source/shims/aggregator/target/tmp/conf
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ 
hive-shims ---
[INFO] No sources to compile
[INFO] 
[INFO] --- maven-surefire-plugin:2.16:test (default-test) @ hive-shims ---
[INFO] Tests are skipped.
[INFO] 
[INFO] --- maven-jar-plugin:2.2:jar (default-jar) @ hive-shims ---
[INFO] Building jar: 
/data/hive-ptest/working/apache-github-source-source/shims/aggregator/target/hive-shims-2.1.0-SNAPSHOT.jar
[INFO] 
[INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ 
hive-shims ---
[INFO] 
[INFO] --- maven-install-plugin:2.4:install (default-install) @ hive-shims ---
[INFO] Installing 
/data/hive-ptest/working/apache-github-source-source/shims/aggregator/target/hive-shims-2.1.0-SNAPSHOT.jar
 to 
/data/hive-ptest/working/maven/org/apache/hive/hive-shims/2.1.0-SNAPSHOT/hive-shims-2.1.0-SNAPSHOT.jar
[INFO] Installing 
/data/hive-ptest/working/apache-github-source-source/shims/aggregator/pom.xml 
to 
/data/hive-ptest/working/maven/org/apache/hive/hive-shims/2.1.0-SNAPSHOT/hive-shims-2.1.0-SNAPSHOT.pom
[INFO] 
[INFO] 
[INFO] Building Hive Storage API 2.1.0-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-storage-api ---
[INFO] Deleting 
/data/hive-ptest/working/apache-github-source-source/storage-api/target
[INFO] Deleting 
/data/hive-ptest/working/apache-github-source-source/storage-api (includes = 
[datanucleus.log, derby.log], excludes = [])
[INFO] 
[INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-no-snapshots) @ 
hive-storage-api ---
[INFO] 
[INFO] --- maven-remote-resources-plugin:1.5:process (default) @ 
hive-storage-api ---
[INFO] 
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ 
hive-storage-api ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory 
/data/hive-ptest/working/apache-github-source-source/storage-api/src/main/resources
[INFO] Copying 3 resources
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ hive-storage-api ---
[INFO] Executing tasks

main:
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ 
hive-storage-api ---
[INFO] Compiling 35 source files to 
/data/hive-ptest/working/apache-github-source-source/storage-api/target/classes
[WARNING] 
/data/hive-ptest/working/apache-github-source-source/storage-api/src/java/org/apache/hadoop/hive/ql/exec/vector/IntervalDayTimeColumnVector.java:[29,51]
 sun.util.calendar.BaseCalendar is internal proprietary API and may be removed 
in a 

[jira] [Commented] (HIVE-9660) store end offset of compressed data for RG in RowIndex in ORC

2016-04-05 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15226889#comment-15226889
 ] 

Sergey Shelukhin commented on HIVE-9660:


SparkOnYarn entirely failed due to some timeouts. Need to update outputs for 3 
more tests...

> store end offset of compressed data for RG in RowIndex in ORC
> -
>
> Key: HIVE-9660
> URL: https://issues.apache.org/jira/browse/HIVE-9660
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-9660.01.patch, HIVE-9660.02.patch, 
> HIVE-9660.03.patch, HIVE-9660.patch, HIVE-9660.patch
>
>
> Right now the end offset is estimated, which in some cases results in tons of 
> extra data being read.
> We can add a separate array to RowIndex (positions_v2?) that stores number of 
> compressed buffers for each RG, or end offset, or something, to remove this 
> estimation magic



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9660) store end offset of compressed data for RG in RowIndex in ORC

2016-04-05 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225829#comment-15225829
 ] 

Hive QA commented on HIVE-9660:
---



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12796909/HIVE-9660.03.patch

{color:green}SUCCESS:{color} +1 due to 11 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 9977 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket6
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_num_buckets
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_orc_merge9
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join1
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join2
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join3
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join4
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join5
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucket4
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_disable_merge_for_bucketing
org.apache.hadoop.hive.metastore.TestRemoteHiveMetaStore.testSimpleTable
org.apache.hadoop.hive.ql.io.orc.TestJsonFileDump.testJsonDump
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProviderWithACL.testSimplePrivileges
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7474/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7474/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7474/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 14 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12796909 - PreCommit-HIVE-TRUNK-Build

> store end offset of compressed data for RG in RowIndex in ORC
> -
>
> Key: HIVE-9660
> URL: https://issues.apache.org/jira/browse/HIVE-9660
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-9660.01.patch, HIVE-9660.02.patch, 
> HIVE-9660.03.patch, HIVE-9660.patch, HIVE-9660.patch
>
>
> Right now the end offset is estimated, which in some cases results in tons of 
> extra data being read.
> We can add a separate array to RowIndex (positions_v2?) that stores number of 
> compressed buffers for each RG, or end offset, or something, to remove this 
> estimation magic



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9660) store end offset of compressed data for RG in RowIndex in ORC

2016-04-03 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223652#comment-15223652
 ] 

Hive QA commented on HIVE-9660:
---



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12796643/HIVE-9660.02.patch

{color:green}SUCCESS:{color} +1 due to 10 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 65 failed/errored test(s), 9975 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_orc
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_stats_orc
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_part
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_columnStatsUpdateForStatsOptimizer_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_optimization2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_full
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_partial
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_partial_ndv
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_analyze
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_file_dump
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_llap
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_merge10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_merge11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_merge12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_schema_evol_stats
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_fast_stats
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_ptf
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_llap_nullscan
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket4
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket5
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket6
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_disable_merge_for_bucketing
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_map_operators
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_num_buckets
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_reducers_power_two
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_list_bucket_dml_10
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_orc_merge1
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_orc_merge2
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_orc_merge9
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_orc_merge_diff_fs
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_reduce_deduplicate
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join1
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join2
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join3
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join4
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join5
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_alter_merge_orc
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_alter_merge_stats_orc
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_3
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_llap_nullscan
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_analyze
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge10
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge11
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge12
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_schema_evol_stats
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union_fast_stats
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_char_simple
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_ptf

[jira] [Commented] (HIVE-9660) store end offset of compressed data for RG in RowIndex in ORC

2016-03-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15218069#comment-15218069
 ] 

Hive QA commented on HIVE-9660:
---



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12795693/HIVE-9660.01.patch

{color:green}SUCCESS:{color} +1 due to 8 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 121 failed/errored test(s), 9891 tests 
executed
*Failed tests:*
{noformat}
TestSparkCliDriver-groupby3_map.q-sample2.q-auto_join14.q-and-12-more - did not 
produce a TEST-*.xml file
TestSparkCliDriver-groupby_map_ppr_multi_distinct.q-table_access_keys_stats.q-groupby4_noskew.q-and-12-more
 - did not produce a TEST-*.xml file
TestSparkCliDriver-join_rc.q-insert1.q-vectorized_rcfile_columnar.q-and-12-more 
- did not produce a TEST-*.xml file
TestSparkCliDriver-parallel_join0.q-union_remove_9.q-smb_mapjoin_21.q-and-12-more
 - did not produce a TEST-*.xml file
TestSparkCliDriver-ppd_join4.q-join9.q-ppd_join3.q-and-12-more - did not 
produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_orc
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_stats_orc
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_part
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_columnStatsUpdateForStatsOptimizer_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_optimization2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_full
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_partial
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_partial_ndv
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_analyze
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_file_dump
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_llap
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_merge10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_merge11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_merge12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_schema_evol_stats
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_fast_stats
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_ptf
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_llap_nullscan
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join1
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join2
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join3
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join4
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join5
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_alter_merge_orc
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_alter_merge_stats_orc
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_3
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_llap_nullscan
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_analyze
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge10
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge11
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge12
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_schema_evol_stats
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union_fast_stats
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_varchar_simple
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_ptf
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_alter_merge_orc
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_alter_merge_stats_orc
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorized_ptf
org.apache.hadoop.hive.ql.TestTxnCommands2.writeBetweenWorkerAndCleaner
org.apache.hadoop.hive.ql.io.orc.TestColumnStatistics.testHasNull
org.apache.hadoop.hive.ql.io.orc.TestFileDump.testBloomFilter
org.apache.hadoop.hive.ql.io.orc.TestFileDump.testBloomFilter2
org.apache.hadoop.hive.ql.io.orc.TestFileDump.testDataDump

[jira] [Commented] (HIVE-9660) store end offset of compressed data for RG in RowIndex in ORC

2016-03-24 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15211170#comment-15211170
 ] 

Hive QA commented on HIVE-9660:
---



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12794925/HIVE-9660.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7359/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7359/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7359/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-7359/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ cd apache-github-source-source
+ git fetch origin
>From https://github.com/apache/hive
   db2efe4..1787082  branch-1   -> origin/branch-1
+ git reset --hard HEAD
HEAD is now at d3a5f20 HIVE-13325: Excessive logging when ORC PPD fails type 
conversions (Prasanth Jayachandran reviewed by Gopal V)
+ git clean -f -d
Removing ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java.orig
Removing ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java.orig
Removing ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java.orig
+ git checkout master
Already on 'master'
+ git reset --hard origin/master
HEAD is now at d3a5f20 HIVE-13325: Excessive logging when ORC PPD fails type 
conversions (Prasanth Jayachandran reviewed by Gopal V)
+ git merge --ff-only origin/master
Already up-to-date.
+ git gc
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12794925 - PreCommit-HIVE-TRUNK-Build

> store end offset of compressed data for RG in RowIndex in ORC
> -
>
> Key: HIVE-9660
> URL: https://issues.apache.org/jira/browse/HIVE-9660
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-9660.WIP2.patch, HIVE-9660.patch, HIVE-9660.patch
>
>
> Right now the end offset is estimated, which in some cases results in tons of 
> extra data being read.
> We can add a separate array to RowIndex (positions_v2?) that stores number of 
> compressed buffers for each RG, or end offset, or something, to remove this 
> estimation magic



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9660) store end offset of compressed data for RG in RowIndex in ORC

2016-03-22 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207820#comment-15207820
 ] 

Sergey Shelukhin commented on HIVE-9660:


[~prasanth_j] fyi

> store end offset of compressed data for RG in RowIndex in ORC
> -
>
> Key: HIVE-9660
> URL: https://issues.apache.org/jira/browse/HIVE-9660
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-9660.WIP2.patch, HIVE-9660.patch, HIVE-9660.patch
>
>
> Right now the end offset is estimated, which in some cases results in tons of 
> extra data being read.
> We can add a separate array to RowIndex (positions_v2?) that stores number of 
> compressed buffers for each RG, or end offset, or something, to remove this 
> estimation magic



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9660) store end offset of compressed data for RG in RowIndex in ORC

2016-03-19 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200794#comment-15200794
 ] 

Sergey Shelukhin commented on HIVE-9660:


The fundamental problem with this patch is that logical writers (e.g. RLE 
writer) buffer the data. And for some writers like bit writer, we cannot even 
force the flush at the end of the RG, which would have solved this problem at 
some small size cost (all the encoding segments would have to terminate at RG 
boundaries). 

> store end offset of compressed data for RG in RowIndex in ORC
> -
>
> Key: HIVE-9660
> URL: https://issues.apache.org/jira/browse/HIVE-9660
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-9660.WIP2.patch
>
>
> Right now the end offset is estimated, which in some cases results in tons of 
> extra data being read.
> We can add a separate array to RowIndex (positions_v2?) that stores number of 
> compressed buffers for each RG, or end offset, or something, to remove this 
> estimation magic



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9660) store end offset of compressed data for RG in RowIndex in ORC

2016-03-09 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15188622#comment-15188622
 ] 

Sergey Shelukhin commented on HIVE-9660:


[~gopalv] fyi. I am working on this.

> store end offset of compressed data for RG in RowIndex in ORC
> -
>
> Key: HIVE-9660
> URL: https://issues.apache.org/jira/browse/HIVE-9660
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>
> Right now the end offset is estimated, which in some cases results in tons of 
> extra data being read.
> We can add a separate array to RowIndex (positions_v2?) that stores number of 
> compressed buffers for each RG, or end offset, or something, to remove this 
> estimation magic



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)