[jira] [Comment Edited] (SQOOP-2981) sqoop import from jdbc: JdbcWritableBridge.readBigDecimal() takes a ton of cpu

2016-07-05 Thread Attila Szabo (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363788#comment-15363788
 ] 

Attila Szabo edited comment on SQOOP-2981 at 7/6/16 5:41 AM:
-

Hi [~Tagar],

Could you please also perform a profiling with the help of Joeri's tool ( 
https://github.com/cerndb/Hadoop-Profiler ). It would require JDK8_u60 or 
above. Is it a doable scenario on your side?

The reason why I'm asking for that is the following:
JdbcWritableBridge#readBigDecimal is just a shorthand for 
java.sql.ResultSet#getBigDecimal(int), so the implementation is within the 
Oracle JDBC driver, and thus Sqoop do not have too much control over that 
method.

Meanwhile I'll try to investigate your scenario on my side, and check if I find 
any clue what could cause you performance issue (or if we could find a 
workaround e.g. mapping to a different type instead of BigDecimal or so).

Cheers,
Maugli


was (Author: maugli):
Hi [~Tagar],

Could you please also perform a profiling with the help of Joremi's tool ( 
https://github.com/cerndb/Hadoop-Profiler ). It would require JDK8_u60 or 
above. Is it a doable scenario on your side?

The reason why I'm asking for that is the following:
JdbcWritableBridge#readBigDecimal is just a shorthand for 
java.sql.ResultSet#getBigDecimal(int), so the implementation is within the 
Oracle JDBC driver, and thus Sqoop do not have too much control over that 
method.

Meanwhile I'll try to investigate your scenario on my side, and check if I find 
any clue what could cause you performance issue (or if we could find a 
workaround e.g. mapping to a different type instead of BigDecimal or so).

Cheers,
Maugli

> sqoop import from jdbc: JdbcWritableBridge.readBigDecimal() takes a ton of cpu
> --
>
> Key: SQOOP-2981
> URL: https://issues.apache.org/jira/browse/SQOOP-2981
> Project: Sqoop
>  Issue Type: Bug
>  Components: codegen, connectors/oracle, sqoop2-jdbc-connector
>Affects Versions: 1.4.5, 1.4.6
> Environment: sqoop import from Oracle; saves as parquet file
>Reporter: Ruslan Dautkhanov
>  Labels: decimal, import, jdbc, oracle, parquet
>
> Majority of time spent of sqoop import from Oracle was on converting Decimal. 
> It was 2.5x times more than following most cpu consumer (snappy compression). 
> Sqoop was 100% on cpu in total, Oracle side was pretty bored.
> {quote}
>  JvmTop 0.8.0 alpha - 11:56:45,  amd64, 48 cpus, Linux 2.6.32-57, load avg 
> 0.92
>  http://code.google.com/p/jvmtop
>  Profiling PID 25489: org.apache.hadoop.mapred.YarnChild 10.20
>   38.78% ( 7.68s) 
> com.cloudera.sqoop.lib.JdbcWritableBridge.readBigDecimal()
>   14.27% ( 2.82s) org.xerial.snappy.SnappyNative.rawCompress()
>   12.67% ( 2.51s) parquet.io.api.Binary$FromStringBinary.encodeUTF8()
>   10.28% ( 2.04s) 
> ...quet.column.values.dictionary.DictionaryValuesWriter$()
>4.80% ( 0.95s) 
> ...quet.column.values.fallback.FallbackValuesWriter.writ()
>3.69% ( 0.73s) com.cloudera.sqoop.lib.JdbcWritableBridge.readString()
>2.51% ( 0.50s) parquet.avro.AvroWriteSupport.writeRecordFields()
>2.30% ( 0.46s) parquet.column.impl.ColumnWriterV1.write()
>1.90% ( 0.38s) 
> ...quet.it.unimi.dsi.fastutil.objects.Object2IntLinkedOp()
>1.31% ( 0.26s) 
> ...quet.column.values.rle.RunLengthBitPackingHybridEncod()
>1.27% ( 0.25s) 
> ...quet.column.values.dictionary.DictionaryValuesWriter$()
>1.22% ( 0.24s) parquet.hadoop.CodecFactory$BytesCompressor.compress()
>0.65% ( 0.13s) 
> ...quet.column.values.dictionary.DictionaryValuesWriter$()
>0.64% ( 0.13s) 
> ...quet.it.unimi.dsi.fastutil.objects.Object2IntLinkedOp()
>0.64% ( 0.13s) parquet.bytes.CapacityByteArrayOutputStream.addSlab()
>0.63% ( 0.12s) parquet.io.api.Binary$ByteArrayBackedBinary.getBytes()
>0.62% ( 0.12s) 
> ...quet.column.values.dictionary.DictionaryValuesWriter.()
>0.58% ( 0.12s) parquet.bytes.CapacityByteArrayOutputStream.setByte()
>0.49% ( 0.10s) parquet.hadoop.codec.SnappyUtil.validateBuffer()
>0.44% ( 0.09s) parquet.hadoop.InternalParquetRecordWriter.write()
> {quote}
> DDL of the table on Oracle side:
> {quote}
> CREATE TABLE someschema.sometable
> (
>   ID NUMBER NOT NULL,
>   psn  VARCHAR2(50 BYTE),
>   MERCHID NUMBER,
>   RECENCYNUMBER,
>   FREQUENCY  NUMBER,
>   SPEND  NUMBER,
>   SPENDING   NUMBER,
>   FREQ   NUMBER
> )
> {quote}
> Sqoop parameters:
> {quote}
> sqoop import 
> -Dmapred.job.name="sqoop import into out_table" 
> --connect "jdbc:oracle:thin:@jdbc_tns" 
> --username username --password password
> --direct 
> --compress 

[jira] [Commented] (SQOOP-2981) sqoop import from jdbc: JdbcWritableBridge.readBigDecimal() takes a ton of cpu

2016-07-05 Thread Attila Szabo (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363788#comment-15363788
 ] 

Attila Szabo commented on SQOOP-2981:
-

Hi [~Tagar],

Could you please also perform a profiling with the help of Joremi's tool ( 
https://github.com/cerndb/Hadoop-Profiler ). It would require JDK8_u60 or 
above. Is it a doable scenario on your side?

The reason why I'm asking for that is the following:
JdbcWritableBridge#readBigDecimal is just a shorthand for 
java.sql.ResultSet#getBigDecimal(int), so the implementation is within the 
Oracle JDBC driver, and thus Sqoop do not have too much control over that 
method.

Meanwhile I'll try to investigate your scenario on my side, and check if I find 
any clue what could cause you performance issue (or if we could find a 
workaround e.g. mapping to a different type instead of BigDecimal or so).

Cheers,
Maugli

> sqoop import from jdbc: JdbcWritableBridge.readBigDecimal() takes a ton of cpu
> --
>
> Key: SQOOP-2981
> URL: https://issues.apache.org/jira/browse/SQOOP-2981
> Project: Sqoop
>  Issue Type: Bug
>  Components: codegen, connectors/oracle, sqoop2-jdbc-connector
>Affects Versions: 1.4.5, 1.4.6
> Environment: sqoop import from Oracle; saves as parquet file
>Reporter: Ruslan Dautkhanov
>  Labels: decimal, import, jdbc, oracle, parquet
>
> Majority of time spent of sqoop import from Oracle was on converting Decimal. 
> It was 2.5x times more than following most cpu consumer (snappy compression). 
> Sqoop was 100% on cpu in total, Oracle side was pretty bored.
> {quote}
>  JvmTop 0.8.0 alpha - 11:56:45,  amd64, 48 cpus, Linux 2.6.32-57, load avg 
> 0.92
>  http://code.google.com/p/jvmtop
>  Profiling PID 25489: org.apache.hadoop.mapred.YarnChild 10.20
>   38.78% ( 7.68s) 
> com.cloudera.sqoop.lib.JdbcWritableBridge.readBigDecimal()
>   14.27% ( 2.82s) org.xerial.snappy.SnappyNative.rawCompress()
>   12.67% ( 2.51s) parquet.io.api.Binary$FromStringBinary.encodeUTF8()
>   10.28% ( 2.04s) 
> ...quet.column.values.dictionary.DictionaryValuesWriter$()
>4.80% ( 0.95s) 
> ...quet.column.values.fallback.FallbackValuesWriter.writ()
>3.69% ( 0.73s) com.cloudera.sqoop.lib.JdbcWritableBridge.readString()
>2.51% ( 0.50s) parquet.avro.AvroWriteSupport.writeRecordFields()
>2.30% ( 0.46s) parquet.column.impl.ColumnWriterV1.write()
>1.90% ( 0.38s) 
> ...quet.it.unimi.dsi.fastutil.objects.Object2IntLinkedOp()
>1.31% ( 0.26s) 
> ...quet.column.values.rle.RunLengthBitPackingHybridEncod()
>1.27% ( 0.25s) 
> ...quet.column.values.dictionary.DictionaryValuesWriter$()
>1.22% ( 0.24s) parquet.hadoop.CodecFactory$BytesCompressor.compress()
>0.65% ( 0.13s) 
> ...quet.column.values.dictionary.DictionaryValuesWriter$()
>0.64% ( 0.13s) 
> ...quet.it.unimi.dsi.fastutil.objects.Object2IntLinkedOp()
>0.64% ( 0.13s) parquet.bytes.CapacityByteArrayOutputStream.addSlab()
>0.63% ( 0.12s) parquet.io.api.Binary$ByteArrayBackedBinary.getBytes()
>0.62% ( 0.12s) 
> ...quet.column.values.dictionary.DictionaryValuesWriter.()
>0.58% ( 0.12s) parquet.bytes.CapacityByteArrayOutputStream.setByte()
>0.49% ( 0.10s) parquet.hadoop.codec.SnappyUtil.validateBuffer()
>0.44% ( 0.09s) parquet.hadoop.InternalParquetRecordWriter.write()
> {quote}
> DDL of the table on Oracle side:
> {quote}
> CREATE TABLE someschema.sometable
> (
>   ID NUMBER NOT NULL,
>   psn  VARCHAR2(50 BYTE),
>   MERCHID NUMBER,
>   RECENCYNUMBER,
>   FREQUENCY  NUMBER,
>   SPEND  NUMBER,
>   SPENDING   NUMBER,
>   FREQ   NUMBER
> )
> {quote}
> Sqoop parameters:
> {quote}
> sqoop import 
> -Dmapred.job.name="sqoop import into out_table" 
> --connect "jdbc:oracle:thin:@jdbc_tns" 
> --username username --password password
> --direct 
> --compress --compression-codec snappy 
> --as-parquetfile 
> --target-dir hdfs_dir
> --num-mappers num_mappers
> --query "SELECT * FROM someschema.sometable WHERE \$CONDITIONS"
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SQOOP-2981) sqoop import from jdbc: JdbcWritableBridge.readBigDecimal() takes a ton of cpu

2016-07-05 Thread Ruslan Dautkhanov (JIRA)

 [ 
https://issues.apache.org/jira/browse/SQOOP-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruslan Dautkhanov updated SQOOP-2981:
-
Description: 
Majority of time spent of sqoop import from Oracle was on converting Decimal. 
It was 2.5x times more than following most cpu consumer (snappy compression). 

Sqoop was 100% on cpu in total, Oracle side was pretty bored.

{quote}
 JvmTop 0.8.0 alpha - 11:56:45,  amd64, 48 cpus, Linux 2.6.32-57, load avg 0.92
 http://code.google.com/p/jvmtop

 Profiling PID 25489: org.apache.hadoop.mapred.YarnChild 10.20

  38.78% ( 7.68s) com.cloudera.sqoop.lib.JdbcWritableBridge.readBigDecimal()
  14.27% ( 2.82s) org.xerial.snappy.SnappyNative.rawCompress()
  12.67% ( 2.51s) parquet.io.api.Binary$FromStringBinary.encodeUTF8()
  10.28% ( 2.04s) ...quet.column.values.dictionary.DictionaryValuesWriter$()
   4.80% ( 0.95s) ...quet.column.values.fallback.FallbackValuesWriter.writ()
   3.69% ( 0.73s) com.cloudera.sqoop.lib.JdbcWritableBridge.readString()
   2.51% ( 0.50s) parquet.avro.AvroWriteSupport.writeRecordFields()
   2.30% ( 0.46s) parquet.column.impl.ColumnWriterV1.write()
   1.90% ( 0.38s) ...quet.it.unimi.dsi.fastutil.objects.Object2IntLinkedOp()
   1.31% ( 0.26s) ...quet.column.values.rle.RunLengthBitPackingHybridEncod()
   1.27% ( 0.25s) ...quet.column.values.dictionary.DictionaryValuesWriter$()
   1.22% ( 0.24s) parquet.hadoop.CodecFactory$BytesCompressor.compress()
   0.65% ( 0.13s) ...quet.column.values.dictionary.DictionaryValuesWriter$()
   0.64% ( 0.13s) ...quet.it.unimi.dsi.fastutil.objects.Object2IntLinkedOp()
   0.64% ( 0.13s) parquet.bytes.CapacityByteArrayOutputStream.addSlab()
   0.63% ( 0.12s) parquet.io.api.Binary$ByteArrayBackedBinary.getBytes()
   0.62% ( 0.12s) ...quet.column.values.dictionary.DictionaryValuesWriter.()
   0.58% ( 0.12s) parquet.bytes.CapacityByteArrayOutputStream.setByte()
   0.49% ( 0.10s) parquet.hadoop.codec.SnappyUtil.validateBuffer()
   0.44% ( 0.09s) parquet.hadoop.InternalParquetRecordWriter.write()
{quote}

DDL of the table on Oracle side:
{quote}
CREATE TABLE someschema.sometable
(
  ID NUMBER NOT NULL,
  psn  VARCHAR2(50 BYTE),
  MERCHID NUMBER,
  RECENCYNUMBER,
  FREQUENCY  NUMBER,
  SPEND  NUMBER,
  SPENDING   NUMBER,
  FREQ   NUMBER
)
{quote}

Sqoop parameters:
{quote}
sqoop import 
-Dmapred.job.name="sqoop import into out_table" 
--connect "jdbc:oracle:thin:@jdbc_tns" 
--username username --password password
--direct 
--compress --compression-codec snappy 
--as-parquetfile 
--target-dir hdfs_dir
--num-mappers num_mappers
--query "SELECT * FROM someschema.sometable WHERE \$CONDITIONS"
{quote}

  was:
Majority of time spent of sqoop import from Oracle was on converting Decimal. 
It was 2.5x times more than following most cpu consumer (snappy compression). 

Sqoop was 100% on cpu in total, Oracle side was pretty bored.

{quote}
 JvmTop 0.8.0 alpha - 11:56:45,  amd64, 48 cpus, Linux 2.6.32-57, load avg 0.92
 http://code.google.com/p/jvmtop

 Profiling PID 25489: org.apache.hadoop.mapred.YarnChild 10.20

  38.78% ( 7.68s) com.cloudera.sqoop.lib.JdbcWritableBridge.readBigDecimal()
  14.27% ( 2.82s) org.xerial.snappy.SnappyNative.rawCompress()
  12.67% ( 2.51s) parquet.io.api.Binary$FromStringBinary.encodeUTF8()
  10.28% ( 2.04s) ...quet.column.values.dictionary.DictionaryValuesWriter$()
   4.80% ( 0.95s) ...quet.column.values.fallback.FallbackValuesWriter.writ()
   3.69% ( 0.73s) com.cloudera.sqoop.lib.JdbcWritableBridge.readString()
   2.51% ( 0.50s) parquet.avro.AvroWriteSupport.writeRecordFields()
   2.30% ( 0.46s) parquet.column.impl.ColumnWriterV1.write()
   1.90% ( 0.38s) ...quet.it.unimi.dsi.fastutil.objects.Object2IntLinkedOp()
   1.31% ( 0.26s) ...quet.column.values.rle.RunLengthBitPackingHybridEncod()
   1.27% ( 0.25s) ...quet.column.values.dictionary.DictionaryValuesWriter$()
   1.22% ( 0.24s) parquet.hadoop.CodecFactory$BytesCompressor.compress()
   0.65% ( 0.13s) ...quet.column.values.dictionary.DictionaryValuesWriter$()
   0.64% ( 0.13s) ...quet.it.unimi.dsi.fastutil.objects.Object2IntLinkedOp()
   0.64% ( 0.13s) parquet.bytes.CapacityByteArrayOutputStream.addSlab()
   0.63% ( 0.12s) parquet.io.api.Binary$ByteArrayBackedBinary.getBytes()
   0.62% ( 0.12s) ...quet.column.values.dictionary.DictionaryValuesWriter.()
   0.58% ( 0.12s) parquet.bytes.CapacityByteArrayOutputStream.setByte()
   0.49% ( 0.10s) parquet.hadoop.codec.SnappyUtil.validateBuffer()
   0.44% ( 0.09s) parquet.hadoop.InternalParquetRecordWriter.write()
{quote}

DDL of the table on Oracle side:
{quote}
CREATE TABLE someschema.sometable
(
  ID NUMBER 

[jira] [Updated] (SQOOP-2981) sqoop import from jdbc: JdbcWritableBridge.readBigDecimal() takes a ton of cpu

2016-07-05 Thread Ruslan Dautkhanov (JIRA)

 [ 
https://issues.apache.org/jira/browse/SQOOP-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruslan Dautkhanov updated SQOOP-2981:
-
Description: 
Majority of time spent of sqoop import from Oracle was on converting Decimal. 
It was 2.5x times more than following most cpu consumer (snappy compression). 

Sqoop was 100% on cpu in total, Oracle side was pretty bored.

{quote}
 JvmTop 0.8.0 alpha - 11:56:45,  amd64, 48 cpus, Linux 2.6.32-57, load avg 0.92
 http://code.google.com/p/jvmtop

 Profiling PID 25489: org.apache.hadoop.mapred.YarnChild 10.20

  38.78% ( 7.68s) com.cloudera.sqoop.lib.JdbcWritableBridge.readBigDecimal()
  14.27% ( 2.82s) org.xerial.snappy.SnappyNative.rawCompress()
  12.67% ( 2.51s) parquet.io.api.Binary$FromStringBinary.encodeUTF8()
  10.28% ( 2.04s) ...quet.column.values.dictionary.DictionaryValuesWriter$()
   4.80% ( 0.95s) ...quet.column.values.fallback.FallbackValuesWriter.writ()
   3.69% ( 0.73s) com.cloudera.sqoop.lib.JdbcWritableBridge.readString()
   2.51% ( 0.50s) parquet.avro.AvroWriteSupport.writeRecordFields()
   2.30% ( 0.46s) parquet.column.impl.ColumnWriterV1.write()
   1.90% ( 0.38s) ...quet.it.unimi.dsi.fastutil.objects.Object2IntLinkedOp()
   1.31% ( 0.26s) ...quet.column.values.rle.RunLengthBitPackingHybridEncod()
   1.27% ( 0.25s) ...quet.column.values.dictionary.DictionaryValuesWriter$()
   1.22% ( 0.24s) parquet.hadoop.CodecFactory$BytesCompressor.compress()
   0.65% ( 0.13s) ...quet.column.values.dictionary.DictionaryValuesWriter$()
   0.64% ( 0.13s) ...quet.it.unimi.dsi.fastutil.objects.Object2IntLinkedOp()
   0.64% ( 0.13s) parquet.bytes.CapacityByteArrayOutputStream.addSlab()
   0.63% ( 0.12s) parquet.io.api.Binary$ByteArrayBackedBinary.getBytes()
   0.62% ( 0.12s) ...quet.column.values.dictionary.DictionaryValuesWriter.()
   0.58% ( 0.12s) parquet.bytes.CapacityByteArrayOutputStream.setByte()
   0.49% ( 0.10s) parquet.hadoop.codec.SnappyUtil.validateBuffer()
   0.44% ( 0.09s) parquet.hadoop.InternalParquetRecordWriter.write()
{quote}

DDL of the table on Oracle side:
{quote}
CREATE TABLE someschema.sometable
(
  ID NUMBER NOT NULL,
  psn  VARCHAR2(50 BYTE),
  MERCHID NUMBER,
  RECENCYNUMBER,
  FREQUENCY  NUMBER,
  SPEND  NUMBER,
  SPENDING   NUMBER,
  FREQ   NUMBER
)
{quote}

Sqoop parameters:
{quote}
sqoop import 
-Dmapred.job.name="sqoop import into {out_table}" 
--connect "jdbc:oracle:thin:@{jdbc_tns}" 
--username {username} --password {password} 
--direct 
--compress --compression-codec snappy 
--as-parquetfile 
--target-dir {hdfs_dir1} 
--num-mappers {num_mappers} 
--query "SELECT * FROM someschema.sometable WHERE \$CONDITIONS"
{quote}

  was:
Majority of time spent of sqoop import from Oracle was on converting Decimal. 
It was 2.5x times more than following most cpu consumer (snappy compression). 

Sqoop was 100% on cpu in total, Oracle side was pretty bored.

{quote}
 JvmTop 0.8.0 alpha - 11:56:45,  amd64, 48 cpus, Linux 2.6.32-57, load avg 0.92
 http://code.google.com/p/jvmtop

 Profiling PID 25489: org.apache.hadoop.mapred.YarnChild 10.20

  38.78% ( 7.68s) com.cloudera.sqoop.lib.JdbcWritableBridge.readBigDecimal()
  14.27% ( 2.82s) org.xerial.snappy.SnappyNative.rawCompress()
  12.67% ( 2.51s) parquet.io.api.Binary$FromStringBinary.encodeUTF8()
  10.28% ( 2.04s) ...quet.column.values.dictionary.DictionaryValuesWriter$()
   4.80% ( 0.95s) ...quet.column.values.fallback.FallbackValuesWriter.writ()
   3.69% ( 0.73s) com.cloudera.sqoop.lib.JdbcWritableBridge.readString()
   2.51% ( 0.50s) parquet.avro.AvroWriteSupport.writeRecordFields()
   2.30% ( 0.46s) parquet.column.impl.ColumnWriterV1.write()
   1.90% ( 0.38s) ...quet.it.unimi.dsi.fastutil.objects.Object2IntLinkedOp()
   1.31% ( 0.26s) ...quet.column.values.rle.RunLengthBitPackingHybridEncod()
   1.27% ( 0.25s) ...quet.column.values.dictionary.DictionaryValuesWriter$()
   1.22% ( 0.24s) parquet.hadoop.CodecFactory$BytesCompressor.compress()
   0.65% ( 0.13s) ...quet.column.values.dictionary.DictionaryValuesWriter$()
   0.64% ( 0.13s) ...quet.it.unimi.dsi.fastutil.objects.Object2IntLinkedOp()
   0.64% ( 0.13s) parquet.bytes.CapacityByteArrayOutputStream.addSlab()
   0.63% ( 0.12s) parquet.io.api.Binary$ByteArrayBackedBinary.getBytes()
   0.62% ( 0.12s) ...quet.column.values.dictionary.DictionaryValuesWriter.()
   0.58% ( 0.12s) parquet.bytes.CapacityByteArrayOutputStream.setByte()
   0.49% ( 0.10s) parquet.hadoop.codec.SnappyUtil.validateBuffer()
   0.44% ( 0.09s) parquet.hadoop.InternalParquetRecordWriter.write()
{quote}

DDL of the table on Oracle side:
{quote}
CREATE TABLE someschema.sometable
(
  ID NUMBER 

[jira] [Created] (SQOOP-2981) sqoop import from jdbc: JdbcWritableBridge.readBigDecimal() takes a ton of cpu

2016-07-05 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created SQOOP-2981:


 Summary: sqoop import from jdbc: 
JdbcWritableBridge.readBigDecimal() takes a ton of cpu
 Key: SQOOP-2981
 URL: https://issues.apache.org/jira/browse/SQOOP-2981
 Project: Sqoop
  Issue Type: Bug
  Components: codegen, connectors/oracle, sqoop2-jdbc-connector
Affects Versions: 1.4.6, 1.4.5
 Environment: sqoop import from Oracle; saves as parquet file
Reporter: Ruslan Dautkhanov


Majority of time spent of sqoop import from Oracle was on converting Decimal. 
It was 2.5x times more than following most cpu consumer (snappy compression). 

Sqoop was 100% on cpu in total, Oracle side was pretty bored.

{quote}
 JvmTop 0.8.0 alpha - 11:56:45,  amd64, 48 cpus, Linux 2.6.32-57, load avg 0.92
 http://code.google.com/p/jvmtop

 Profiling PID 25489: org.apache.hadoop.mapred.YarnChild 10.20

  38.78% ( 7.68s) com.cloudera.sqoop.lib.JdbcWritableBridge.readBigDecimal()
  14.27% ( 2.82s) org.xerial.snappy.SnappyNative.rawCompress()
  12.67% ( 2.51s) parquet.io.api.Binary$FromStringBinary.encodeUTF8()
  10.28% ( 2.04s) ...quet.column.values.dictionary.DictionaryValuesWriter$()
   4.80% ( 0.95s) ...quet.column.values.fallback.FallbackValuesWriter.writ()
   3.69% ( 0.73s) com.cloudera.sqoop.lib.JdbcWritableBridge.readString()
   2.51% ( 0.50s) parquet.avro.AvroWriteSupport.writeRecordFields()
   2.30% ( 0.46s) parquet.column.impl.ColumnWriterV1.write()
   1.90% ( 0.38s) ...quet.it.unimi.dsi.fastutil.objects.Object2IntLinkedOp()
   1.31% ( 0.26s) ...quet.column.values.rle.RunLengthBitPackingHybridEncod()
   1.27% ( 0.25s) ...quet.column.values.dictionary.DictionaryValuesWriter$()
   1.22% ( 0.24s) parquet.hadoop.CodecFactory$BytesCompressor.compress()
   0.65% ( 0.13s) ...quet.column.values.dictionary.DictionaryValuesWriter$()
   0.64% ( 0.13s) ...quet.it.unimi.dsi.fastutil.objects.Object2IntLinkedOp()
   0.64% ( 0.13s) parquet.bytes.CapacityByteArrayOutputStream.addSlab()
   0.63% ( 0.12s) parquet.io.api.Binary$ByteArrayBackedBinary.getBytes()
   0.62% ( 0.12s) ...quet.column.values.dictionary.DictionaryValuesWriter.()
   0.58% ( 0.12s) parquet.bytes.CapacityByteArrayOutputStream.setByte()
   0.49% ( 0.10s) parquet.hadoop.codec.SnappyUtil.validateBuffer()
   0.44% ( 0.09s) parquet.hadoop.InternalParquetRecordWriter.write()
{quote}

DDL of the table on Oracle side:
{quote}
CREATE TABLE someschema.sometable
(
  ID NUMBER NOT NULL,
  psn  VARCHAR2(50 BYTE),
  MERCHID NUMBER,
  RECENCYNUMBER,
  FREQUENCY  NUMBER,
  SPEND  NUMBER,
  SPENDING   NUMBER,
  FREQ   NUMBER
)
{quote}

Sqoop parameters:
{quote}
sqoop import \
-Dmapred.job.name="sqoop import into {out_table}" \
--connect "jdbc:oracle:thin:@{jdbc_tns}" \
--username {username} --password {password} \
--direct \
--compress --compression-codec snappy \
--as-parquetfile \
--target-dir {hdfs_dir1} \
--num-mappers {num_mappers} \
--query "SELECT * FROM someschema.sometable WHERE \$CONDITIONS"
{quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SQOOP-2820) Sqoop2: Encryption over the REST interface

2016-07-05 Thread Abraham Fine (JIRA)

 [ 
https://issues.apache.org/jira/browse/SQOOP-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abraham Fine updated SQOOP-2820:

Fix Version/s: 1.99.7

> Sqoop2: Encryption over the REST interface
> --
>
> Key: SQOOP-2820
> URL: https://issues.apache.org/jira/browse/SQOOP-2820
> Project: Sqoop
>  Issue Type: Improvement
>Affects Versions: 1.99.6
>Reporter: Abraham Fine
>Assignee: Abraham Fine
> Fix For: 1.99.7
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SQOOP-2930) Sqoop job exec not overriding the saved job generic properties

2016-07-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-2930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15362321#comment-15362321
 ] 

ASF GitHub Bot commented on SQOOP-2930:
---

Github user git-rbanerjee commented on the issue:

https://github.com/apache/sqoop/pull/20
  
Thanks Jarek !!

Patch is also added .
https://issues.apache.org/jira/browse/SQOOP-2930
https://issues.apache.org/jira/secure/attachment/12806082/fixpatch_v1.patch

On Tue, Jul 5, 2016 at 9:45 AM, Jarek Jarcec Cecho  wrote:

> Hi @git-rbanerjee  ,
> Sqoop project currently does not accept pull requests. You will need to
> generate patch and upload it to the JIRA as per our instructions here:
>
> https://cwiki.apache.org/confluence/display/SQOOP/How+to+Contribute
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> , or mute
> the thread
> 

> .
>



-- 
*Rabin Banerjee *



> Sqoop job exec not overriding the saved job generic properties
> --
>
> Key: SQOOP-2930
> URL: https://issues.apache.org/jira/browse/SQOOP-2930
> Project: Sqoop
>  Issue Type: Bug
>Reporter: Rabin Banerjee
> Attachments: fixpatch_v1.patch
>
>
> Sqoop job exec not overriding the saved job generic properties .
> sqoop job -Dorg.apache.sqoop.xyz=xyz --create job1 -- import .. 
> sqoop job -Dorg.apache.sqoop.xyz=abc --exec job1
> exec is not overriding the xyz with abc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] sqoop issue #20: SQOOP-2930 & SQOOP-1933 :: Sqoop job exec not overriding th...

2016-07-05 Thread git-rbanerjee
Github user git-rbanerjee commented on the issue:

https://github.com/apache/sqoop/pull/20
  
Thanks Jarek !!

Patch is also added .
https://issues.apache.org/jira/browse/SQOOP-2930
https://issues.apache.org/jira/secure/attachment/12806082/fixpatch_v1.patch

On Tue, Jul 5, 2016 at 9:45 AM, Jarek Jarcec Cecho  wrote:

> Hi @git-rbanerjee  ,
> Sqoop project currently does not accept pull requests. You will need to
> generate patch and upload it to the JIRA as per our instructions here:
>
> https://cwiki.apache.org/confluence/display/SQOOP/How+to+Contribute
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> , or mute
> the thread
> 

> .
>



-- 
*Rabin Banerjee *



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---