Re: Review Request 66548: Importing as ORC file to support full ACID Hive tables

2018-06-28 Thread Boglarka Egyed

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66548/#review205513
---



Hi Daniel,

Thanks for updating your patch, I think we are very close to commit it. Could 
you please rebase it to the latest trunk version as some new changes have been 
committed recently? :)

Thanks,
Bogi

- Boglarka Egyed


On May 2, 2018, 12:12 p.m., daniel voros wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66548/
> ---
> 
> (Updated May 2, 2018, 12:12 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-3311
> https://issues.apache.org/jira/browse/SQOOP-3311
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> ---
> 
> Hive 3 will introduce a switch (HIVE-18294) to create eligible tables as ACID 
> by default. This will probably result in increased usage of ACID tables and 
> the need to support importing into ACID tables with Sqoop.
> 
> Currently the only table format supporting full ACID tables is ORC.
> 
> The easiest and most effective way to support importing into these tables 
> would be to write out files as ORC and keep using LOAD DATA as we do for all 
> other Hive tables (supported since HIVE-17361).
> 
> Workaround could be to create table as textfile (as before) and then CTAS 
> from that. This would push the responsibility of creating ORC format to Hive. 
> However it would result in writing every record twice; in text format and in 
> ORC.
> 
> Note that ORC is only necessary for full ACID tables. Insert-only (aka. 
> micromanaged) ACID tables can use arbitrary file format.
> 
> Supporting full ACID tables would also be the first step in making 
> "lastmodified" incremental imports work with Hive.
> 
> 
> Diffs
> -
> 
>   ivy.xml 1f587f3e 
>   ivy/libraries.properties 565a8bf5 
>   src/docs/man/import-common-args.txt 22e3448e 
>   src/docs/man/sqoop-import-all-tables.txt 6db38ad8 
>   src/docs/user/export-purpose.txt def6ead3 
>   src/docs/user/hcatalog.txt 2ae1d54d 
>   src/docs/user/help.txt 8a0d1477 
>   src/docs/user/import-all-tables.txt fbad47b2 
>   src/docs/user/import-purpose.txt c7eca606 
>   src/docs/user/import.txt 2d074f49 
>   src/java/org/apache/sqoop/SqoopOptions.java d9984af3 
>   src/java/org/apache/sqoop/hive/TableDefWriter.java 27d988c5 
>   src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java 3b542102 
>   src/java/org/apache/sqoop/mapreduce/OrcImportMapper.java PRE-CREATION 
>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java c62ee98c 
>   src/java/org/apache/sqoop/tool/ExportTool.java 060f2c07 
>   src/java/org/apache/sqoop/tool/ImportTool.java 2c474b7e 
>   src/java/org/apache/sqoop/util/OrcConversionContext.java PRE-CREATION 
>   src/java/org/apache/sqoop/util/OrcUtil.java PRE-CREATION 
>   src/test/org/apache/sqoop/TestAllTables.java 16933a82 
>   src/test/org/apache/sqoop/TestOrcImport.java PRE-CREATION 
>   src/test/org/apache/sqoop/hive/TestHiveServer2TextImport.java f6d591b7 
>   src/test/org/apache/sqoop/hive/TestTableDefWriter.java 3ea61f64 
>   src/test/org/apache/sqoop/util/TestOrcConversionContext.java PRE-CREATION 
>   src/test/org/apache/sqoop/util/TestOrcUtil.java PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/66548/diff/12/
> 
> 
> Testing
> ---
> 
> - added some unit tests
> - tested basic Hive import scenarios on a cluster
> 
> 
> Thanks,
> 
> daniel voros
> 
>



Re: Review Request 66548: Importing as ORC file to support full ACID Hive tables

2018-06-14 Thread Boglarka Egyed

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66548/#review204773
---



Hi Daniel,

Thanks for this improvement, your change generally LGTM, I also like the amount 
of test cases you attached to this change!

I have a couple of minor findings, please find them below, also I ran unit 
tests and the newly added TestOrcConversionContext failed for me with the 
following error message:

Testcase: testGetConverterDatetimeTypes took 0.011 sec
Caused an ERROR
org.apache.orc.storage.serde2.io.DateWritable cannot be cast to 
org.apache.hadoop.hive.serde2.io.DateWritable
java.lang.ClassCastException: org.apache.orc.storage.serde2.io.DateWritable 
cannot be cast to org.apache.hadoop.hive.serde2.io.DateWritable
at 
org.apache.sqoop.util.TestOrcConversionContext.testGetConverterDatetimeTypes(TestOrcConversionContext.java:97)

Could you please take a look?

Thanks in advance,
Bogi


src/java/org/apache/sqoop/util/OrcUtil.java
Lines 38-47 (patched)


Input parameter overrides is missing from javadoc here.



src/test/org/apache/sqoop/TestAllTables.java
Line 41 (original), 52 (patched)


Could you please revert this to prevent wild card import usage?



src/test/org/apache/sqoop/TestOrcImport.java
Lines 62-81 (patched)


Would you mind use the newly introduced ArgumentArrayBuilder logic in tests 
instead? It has been added in https://reviews.apache.org/r/65607/


- Boglarka Egyed


On May 2, 2018, 12:12 p.m., daniel voros wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66548/
> ---
> 
> (Updated May 2, 2018, 12:12 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-3311
> https://issues.apache.org/jira/browse/SQOOP-3311
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> ---
> 
> Hive 3 will introduce a switch (HIVE-18294) to create eligible tables as ACID 
> by default. This will probably result in increased usage of ACID tables and 
> the need to support importing into ACID tables with Sqoop.
> 
> Currently the only table format supporting full ACID tables is ORC.
> 
> The easiest and most effective way to support importing into these tables 
> would be to write out files as ORC and keep using LOAD DATA as we do for all 
> other Hive tables (supported since HIVE-17361).
> 
> Workaround could be to create table as textfile (as before) and then CTAS 
> from that. This would push the responsibility of creating ORC format to Hive. 
> However it would result in writing every record twice; in text format and in 
> ORC.
> 
> Note that ORC is only necessary for full ACID tables. Insert-only (aka. 
> micromanaged) ACID tables can use arbitrary file format.
> 
> Supporting full ACID tables would also be the first step in making 
> "lastmodified" incremental imports work with Hive.
> 
> 
> Diffs
> -
> 
>   ivy.xml 1f587f3e 
>   ivy/libraries.properties 565a8bf5 
>   src/docs/man/import-common-args.txt 22e3448e 
>   src/docs/man/sqoop-import-all-tables.txt 6db38ad8 
>   src/docs/user/export-purpose.txt def6ead3 
>   src/docs/user/hcatalog.txt 2ae1d54d 
>   src/docs/user/help.txt 8a0d1477 
>   src/docs/user/import-all-tables.txt fbad47b2 
>   src/docs/user/import-purpose.txt c7eca606 
>   src/docs/user/import.txt 2d074f49 
>   src/java/org/apache/sqoop/SqoopOptions.java d9984af3 
>   src/java/org/apache/sqoop/hive/TableDefWriter.java 27d988c5 
>   src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java 3b542102 
>   src/java/org/apache/sqoop/mapreduce/OrcImportMapper.java PRE-CREATION 
>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java c62ee98c 
>   src/java/org/apache/sqoop/tool/ExportTool.java 060f2c07 
>   src/java/org/apache/sqoop/tool/ImportTool.java 2c474b7e 
>   src/java/org/apache/sqoop/util/OrcConversionContext.java PRE-CREATION 
>   src/java/org/apache/sqoop/util/OrcUtil.java PRE-CREATION 
>   src/test/org/apache/sqoop/TestAllTables.java 16933a82 
>   src/test/org/apache/sqoop/TestOrcImport.java PRE-CREATION 
>   src/test/org/apache/sqoop/hive/TestHiveServer2TextImport.java f6d591b7 
>   src/test/org/apache/sqoop/hive/TestTableDefWriter.java 3ea61f64 
>   src/test/org/apache/sqoop/util/TestOrcConversionContext.java PRE-CREATION 
>   src/test/org/apache/sqoop/util/TestOrcUtil.java PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/66548/diff/11/
> 
> 
> Testing
> ---
> 
> - added some unit tests
> - tested basic Hive import scenarios on a cluster
> 
> 
> Thanks,
> 
> daniel voros
> 
>



Re: Review Request 66548: Importing as ORC file to support full ACID Hive tables

2018-05-30 Thread Szabolcs Vasas


> On May 24, 2018, 1:07 p.m., Szabolcs Vasas wrote:
> > src/test/org/apache/sqoop/TestOrcImport.java
> > Lines 50 (patched)
> > 
> >
> > It seems that this test case only covers the HDFS import, can we add 
> > test cases which cover the Hive import too?
> 
> daniel voros wrote:
> 95% of this patch is making the HDFS import work, since the Hive import 
> part is only issueing a LOAD DATA statement. That part is tested in 
> `TestTableDefWriter`. I believe, that adding tests similar to the ones in 
> `TestHiveImport` wouldn't make much sense, since they're only checking if 
> CREATE TABLE/LOAD DATA statements are as expected or not.
> 
> Please let me know if you can think of a better way to test the Hive 
> import!

Yeah, I agree that TestHiveImport is not an example of an ideal test case...
TestTableDefWriter checks if 'STORED AS ORC' is in the CREATE TABLE statement 
but it does not check if the statement is syntactically correct.
I think it would make sense to parameterize 
org.apache.sqoop.hive.TestHiveServer2TextImport to test all the supported file 
formats. It would be a really small change to the test but it would make sure 
that the CREATE TABLE statement is correct. 
Once https://issues.apache.org/jira/browse/SQOOP-3328 is done I could add the 
Parquet case to this test as well.


- Szabolcs


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66548/#review203762
---


On May 2, 2018, 12:12 p.m., daniel voros wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66548/
> ---
> 
> (Updated May 2, 2018, 12:12 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-3311
> https://issues.apache.org/jira/browse/SQOOP-3311
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> ---
> 
> Hive 3 will introduce a switch (HIVE-18294) to create eligible tables as ACID 
> by default. This will probably result in increased usage of ACID tables and 
> the need to support importing into ACID tables with Sqoop.
> 
> Currently the only table format supporting full ACID tables is ORC.
> 
> The easiest and most effective way to support importing into these tables 
> would be to write out files as ORC and keep using LOAD DATA as we do for all 
> other Hive tables (supported since HIVE-17361).
> 
> Workaround could be to create table as textfile (as before) and then CTAS 
> from that. This would push the responsibility of creating ORC format to Hive. 
> However it would result in writing every record twice; in text format and in 
> ORC.
> 
> Note that ORC is only necessary for full ACID tables. Insert-only (aka. 
> micromanaged) ACID tables can use arbitrary file format.
> 
> Supporting full ACID tables would also be the first step in making 
> "lastmodified" incremental imports work with Hive.
> 
> 
> Diffs
> -
> 
>   ivy.xml 1f587f3e 
>   ivy/libraries.properties 565a8bf5 
>   src/docs/man/import-common-args.txt 22e3448e 
>   src/docs/man/sqoop-import-all-tables.txt 6db38ad8 
>   src/docs/user/export-purpose.txt def6ead3 
>   src/docs/user/hcatalog.txt 2ae1d54d 
>   src/docs/user/help.txt 8a0d1477 
>   src/docs/user/import-all-tables.txt fbad47b2 
>   src/docs/user/import-purpose.txt c7eca606 
>   src/docs/user/import.txt 2d074f49 
>   src/java/org/apache/sqoop/SqoopOptions.java d9984af3 
>   src/java/org/apache/sqoop/hive/TableDefWriter.java 27d988c5 
>   src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java a5962ba4 
>   src/java/org/apache/sqoop/mapreduce/OrcImportMapper.java PRE-CREATION 
>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java 783651a4 
>   src/java/org/apache/sqoop/tool/ExportTool.java 060f2c07 
>   src/java/org/apache/sqoop/tool/ImportTool.java ee79d8b7 
>   src/java/org/apache/sqoop/util/OrcConversionContext.java PRE-CREATION 
>   src/java/org/apache/sqoop/util/OrcUtil.java PRE-CREATION 
>   src/test/org/apache/sqoop/TestAllTables.java 16933a82 
>   src/test/org/apache/sqoop/TestOrcImport.java PRE-CREATION 
>   src/test/org/apache/sqoop/hive/TestTableDefWriter.java 3ea61f64 
>   src/test/org/apache/sqoop/util/TestOrcConversionContext.java PRE-CREATION 
>   src/test/org/apache/sqoop/util/TestOrcUtil.java PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/66548/diff/10/
> 
> 
> Testing
> ---
> 
> - added some unit tests
> - tested basic Hive import scenarios on a cluster
> 
> 
> Thanks,
> 
> daniel voros
> 
>



Re: Review Request 66548: Importing as ORC file to support full ACID Hive tables

2018-05-24 Thread Szabolcs Vasas

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66548/#review203762
---



Hi Dani,

Thank you for submitting this new feature and sorry for the late review.
I have left some minor comments inline and I suggest adding some more 
documentation explaining what exactly is supported with the ORC files.
The implementation suggests that we only support HDFS and Hive import at this 
moment, so export is not covered yet. If this is true I think we should 
emphasize it in the documentation.


src/java/org/apache/sqoop/hive/TableDefWriter.java
Lines 194 (patched)


I am not that familiar with Hive CREATE TABLE statement but as far as I 
understand 'STORED AS ORC' basically means that we will use 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat, 
org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat, 
org.apache.hadoop.hive.ql.io.orc.OrcSerde is that correct?



src/java/org/apache/sqoop/hive/TableDefWriter.java
Line 197 (original), 200 (patched)


Does ORC support different compression codecs? If yes, I think we should 
emphasize in the documentation (and/or implement a fail fast) that Sqoop does 
not support the compression-codec option with ORC files at the moment.



src/java/org/apache/sqoop/mapreduce/OrcImportMapper.java
Lines 41 (patched)


Can we just use NullWritable.get() instead of introducing a field called 
"nada"?



src/java/org/apache/sqoop/mapreduce/OrcImportMapper.java
Lines 42 (patched)


Can we make this field private?



src/java/org/apache/sqoop/util/OrcConversionContext.java
Lines 98 (patched)


typo: tinyiny



src/java/org/apache/sqoop/util/OrcUtil.java
Lines 55 (patched)


We use Hive types as ORC schema types here, is this always going to be 
correct?
I am not too familiar with the ORC type, does it support all the Hive data 
types?



src/test/org/apache/sqoop/TestOrcImport.java
Lines 50 (patched)


It seems that this test case only covers the HDFS import, can we add test 
cases which cover the Hive import too?


- Szabolcs Vasas


On May 2, 2018, 12:12 p.m., daniel voros wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66548/
> ---
> 
> (Updated May 2, 2018, 12:12 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-3311
> https://issues.apache.org/jira/browse/SQOOP-3311
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> ---
> 
> Hive 3 will introduce a switch (HIVE-18294) to create eligible tables as ACID 
> by default. This will probably result in increased usage of ACID tables and 
> the need to support importing into ACID tables with Sqoop.
> 
> Currently the only table format supporting full ACID tables is ORC.
> 
> The easiest and most effective way to support importing into these tables 
> would be to write out files as ORC and keep using LOAD DATA as we do for all 
> other Hive tables (supported since HIVE-17361).
> 
> Workaround could be to create table as textfile (as before) and then CTAS 
> from that. This would push the responsibility of creating ORC format to Hive. 
> However it would result in writing every record twice; in text format and in 
> ORC.
> 
> Note that ORC is only necessary for full ACID tables. Insert-only (aka. 
> micromanaged) ACID tables can use arbitrary file format.
> 
> Supporting full ACID tables would also be the first step in making 
> "lastmodified" incremental imports work with Hive.
> 
> 
> Diffs
> -
> 
>   ivy.xml 1f587f3e 
>   ivy/libraries.properties 565a8bf5 
>   src/docs/man/import-common-args.txt 22e3448e 
>   src/docs/man/sqoop-import-all-tables.txt 6db38ad8 
>   src/docs/user/hcatalog.txt 2ae1d54d 
>   src/docs/user/help.txt 8a0d1477 
>   src/docs/user/import-all-tables.txt fbad47b2 
>   src/docs/user/import.txt 2d074f49 
>   src/java/org/apache/sqoop/SqoopOptions.java d9984af3 
>   src/java/org/apache/sqoop/hive/TableDefWriter.java 27d988c5 
>   src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java a5962ba4 
>   src/java/org/apache/sqoop/mapreduce/OrcImportMapper.java PRE-CREATION 
>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java 783651a4 
>   src/java/org/apache/sqoop/tool/ExportTool.java 060f2c07 
>   src/java/org/apache/sqoop/tool/ImportTool.java ee79d8b7 
>   src/java/org/apache/sqoop/util/OrcConversionContext.java PRE-CREATION 
>   src/java/org/apache/sqoop/util/OrcUtil.java PRE-CREATION 
>   

Re: Review Request 66548: Importing as ORC file to support full ACID Hive tables

2018-05-10 Thread Attila Szabo

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66548/#review202879
---



Hi Daniel,

I did not have time for a full review yet, though already found a few smaller 
flaws, and one conceptional one.

Conceptional:
I'd strongly suggest to extract ORC reading/writing only as a subtask as ACID 
Hive tables. The clean advantages would be:
We would be able to test ORC file import/export without any Hive related 
changes, and thus testing the correctness with PrestoDB, or any other tool 
which can read read/write ORC files.
What do you think?

Smaller ones:
See inline.

Regards,
Attila


ivy.xml
Lines 144-149 (patched)


Are we sure this is the total exclude list what we need, and no Hive or 
other library related "debris" are coming with it?



ivy/libraries.properties
Lines 62 (patched)


Shouldn't we aim for the nohive version of 1.4.3?



src/java/org/apache/sqoop/mapreduce/OrcImportMapper.java
Lines 47-51 (patched)


Aren't we missing a super call here?



src/java/org/apache/sqoop/mapreduce/OrcImportMapper.java
Lines 53 (patched)


Occording to the general code guidelines and Joschua Bloch (Effective Java) 
this is a very dangerous practice to initialize an object in a non-final 
initializer. 

Please correct this!


- Attila Szabo


On May 2, 2018, 12:12 p.m., daniel voros wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66548/
> ---
> 
> (Updated May 2, 2018, 12:12 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-3311
> https://issues.apache.org/jira/browse/SQOOP-3311
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> ---
> 
> Hive 3 will introduce a switch (HIVE-18294) to create eligible tables as ACID 
> by default. This will probably result in increased usage of ACID tables and 
> the need to support importing into ACID tables with Sqoop.
> 
> Currently the only table format supporting full ACID tables is ORC.
> 
> The easiest and most effective way to support importing into these tables 
> would be to write out files as ORC and keep using LOAD DATA as we do for all 
> other Hive tables (supported since HIVE-17361).
> 
> Workaround could be to create table as textfile (as before) and then CTAS 
> from that. This would push the responsibility of creating ORC format to Hive. 
> However it would result in writing every record twice; in text format and in 
> ORC.
> 
> Note that ORC is only necessary for full ACID tables. Insert-only (aka. 
> micromanaged) ACID tables can use arbitrary file format.
> 
> Supporting full ACID tables would also be the first step in making 
> "lastmodified" incremental imports work with Hive.
> 
> 
> Diffs
> -
> 
>   ivy.xml 6af94d9d 
>   ivy/libraries.properties c44b50bc 
>   src/docs/man/import-common-args.txt 22e3448e 
>   src/docs/man/sqoop-import-all-tables.txt 6db38ad8 
>   src/docs/user/hcatalog.txt 2ae1d54d 
>   src/docs/user/help.txt 8a0d1477 
>   src/docs/user/import-all-tables.txt fbad47b2 
>   src/docs/user/import.txt 2d074f49 
>   src/java/org/apache/sqoop/SqoopOptions.java d9984af3 
>   src/java/org/apache/sqoop/hive/TableDefWriter.java 27d988c5 
>   src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java a5962ba4 
>   src/java/org/apache/sqoop/mapreduce/OrcImportMapper.java PRE-CREATION 
>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java 783651a4 
>   src/java/org/apache/sqoop/tool/ExportTool.java 060f2c07 
>   src/java/org/apache/sqoop/tool/ImportTool.java ee79d8b7 
>   src/java/org/apache/sqoop/util/OrcConversionContext.java PRE-CREATION 
>   src/java/org/apache/sqoop/util/OrcUtil.java PRE-CREATION 
>   src/test/org/apache/sqoop/TestAllTables.java 56d1f577 
>   src/test/org/apache/sqoop/TestOrcImport.java PRE-CREATION 
>   src/test/org/apache/sqoop/hive/TestTableDefWriter.java 3ea61f64 
>   src/test/org/apache/sqoop/util/TestOrcConversionContext.java PRE-CREATION 
>   src/test/org/apache/sqoop/util/TestOrcUtil.java PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/66548/diff/7/
> 
> 
> Testing
> ---
> 
> - added some unit tests
> - tested basic Hive import scenarios on a cluster
> 
> 
> Thanks,
> 
> daniel voros
> 
>



Re: Review Request 66548: Importing as ORC file to support full ACID Hive tables

2018-05-02 Thread daniel voros

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66548/
---

(Updated May 2, 2018, 12:12 p.m.)


Review request for Sqoop.


Changes
---

Patch #6 fixes `TestOrcImport#testDatetimeTypeOverrides` (fixed timezone).


Bugs: SQOOP-3311
https://issues.apache.org/jira/browse/SQOOP-3311


Repository: sqoop-trunk


Description
---

Hive 3 will introduce a switch (HIVE-18294) to create eligible tables as ACID 
by default. This will probably result in increased usage of ACID tables and the 
need to support importing into ACID tables with Sqoop.

Currently the only table format supporting full ACID tables is ORC.

The easiest and most effective way to support importing into these tables would 
be to write out files as ORC and keep using LOAD DATA as we do for all other 
Hive tables (supported since HIVE-17361).

Workaround could be to create table as textfile (as before) and then CTAS from 
that. This would push the responsibility of creating ORC format to Hive. 
However it would result in writing every record twice; in text format and in 
ORC.

Note that ORC is only necessary for full ACID tables. Insert-only (aka. 
micromanaged) ACID tables can use arbitrary file format.

Supporting full ACID tables would also be the first step in making 
"lastmodified" incremental imports work with Hive.


Diffs (updated)
-

  ivy.xml 6af94d9d 
  ivy/libraries.properties c44b50bc 
  src/java/org/apache/sqoop/SqoopOptions.java d9984af3 
  src/java/org/apache/sqoop/hive/TableDefWriter.java 27d988c5 
  src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java a5962ba4 
  src/java/org/apache/sqoop/mapreduce/OrcImportMapper.java PRE-CREATION 
  src/java/org/apache/sqoop/tool/BaseSqoopTool.java 783651a4 
  src/java/org/apache/sqoop/tool/ExportTool.java 060f2c07 
  src/java/org/apache/sqoop/tool/ImportTool.java ee79d8b7 
  src/java/org/apache/sqoop/util/OrcConversionContext.java PRE-CREATION 
  src/java/org/apache/sqoop/util/OrcUtil.java PRE-CREATION 
  src/test/org/apache/sqoop/TestAllTables.java 56d1f577 
  src/test/org/apache/sqoop/TestOrcImport.java PRE-CREATION 
  src/test/org/apache/sqoop/hive/TestTableDefWriter.java 3ea61f64 
  src/test/org/apache/sqoop/util/TestOrcConversionContext.java PRE-CREATION 
  src/test/org/apache/sqoop/util/TestOrcUtil.java PRE-CREATION 


Diff: https://reviews.apache.org/r/66548/diff/6/

Changes: https://reviews.apache.org/r/66548/diff/5-6/


Testing
---

- added some unit tests
- tested basic Hive import scenarios on a cluster


Thanks,

daniel voros



Re: Review Request 66548: Importing as ORC file to support full ACID Hive tables

2018-04-11 Thread daniel voros

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66548/#review200902
---



Patch #1 is an initial patch that contains the most fundamental changes to 
support ORC importing. I'll add documentation and extend the tests with 
thridparty tests etc. but wanted to share to get feedback early on.

- daniel voros


On April 11, 2018, 12:02 p.m., daniel voros wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66548/
> ---
> 
> (Updated April 11, 2018, 12:02 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-3311
> https://issues.apache.org/jira/browse/SQOOP-3311
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> ---
> 
> Hive 3 will introduce a switch (HIVE-18294) to create eligible tables as ACID 
> by default. This will probably result in increased usage of ACID tables and 
> the need to support importing into ACID tables with Sqoop.
> 
> Currently the only table format supporting full ACID tables is ORC.
> 
> The easiest and most effective way to support importing into these tables 
> would be to write out files as ORC and keep using LOAD DATA as we do for all 
> other Hive tables (supported since HIVE-17361).
> 
> Workaround could be to create table as textfile (as before) and then CTAS 
> from that. This would push the responsibility of creating ORC format to Hive. 
> However it would result in writing every record twice; in text format and in 
> ORC.
> 
> Note that ORC is only necessary for full ACID tables. Insert-only (aka. 
> micromanaged) ACID tables can use arbitrary file format.
> 
> Supporting full ACID tables would also be the first step in making 
> "lastmodified" incremental imports work with Hive.
> 
> 
> Diffs
> -
> 
>   ivy.xml 6be4fa2 
>   ivy/libraries.properties c44b50b 
>   src/java/org/apache/sqoop/SqoopOptions.java 651cebd 
>   src/java/org/apache/sqoop/hive/TableDefWriter.java b7a25b7 
>   src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java a5962ba 
>   src/java/org/apache/sqoop/mapreduce/OrcImportMapper.java PRE-CREATION 
>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java b02e4fe 
>   src/java/org/apache/sqoop/tool/ExportTool.java 060f2c0 
>   src/java/org/apache/sqoop/tool/ImportTool.java e992005 
>   src/java/org/apache/sqoop/util/OrcUtil.java PRE-CREATION 
>   src/test/org/apache/sqoop/TestOrcImport.java PRE-CREATION 
>   src/test/org/apache/sqoop/hive/TestTableDefWriter.java 8bdc3be 
>   src/test/org/apache/sqoop/orm/TestClassWriter.java 0cc07cf 
>   src/test/org/apache/sqoop/util/TestOrcUtil.java PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/66548/diff/1/
> 
> 
> Testing
> ---
> 
> - added some unit tests
> - tested basic Hive import scenarios on a cluster
> 
> 
> Thanks,
> 
> daniel voros
> 
>