[jira] [Created] (SQOOP-3322) Version differences between ivy configurations

2018-05-04 Thread Daniel Voros (JIRA)
Daniel Voros created SQOOP-3322:
---

 Summary: Version differences between ivy configurations
 Key: SQOOP-3322
 URL: https://issues.apache.org/jira/browse/SQOOP-3322
 Project: Sqoop
  Issue Type: Bug
  Components: build
Affects Versions: 1.4.7
Reporter: Daniel Voros
Assignee: Daniel Voros


We have multiple ivy configurations defined in ivy.xml.
 - The {{redist}} configuration is used to select the artifacts that need to be 
distributed with Sqoop in its tar.gz.
 - The {{common}} configuration is used to set the classpath during compilation 
(also refered to as 'hadoop classpath')
 -  The {{test}} configuration is used to set the classpath during junit 
execution. It extends the {{common}} config.

Some artifacts end up having different versions between these three 
configurations, which means we're using different versions during 
compilation/testing/runtime.

Differences:
||Artifact||redist||common (compilation)||test||
|commons-pool|not in redist|1.5.4|*1.6*|
|commons-codec|*1.4*|1.9|1.9|
|commons-io|*1.4*|2.4|2.4|
|commons-logging|*1.1.1*|1.2|1.2|
|slf4j-api|*1.6.1*|1.7.7|1.7.7|

I'd suggest using the version *in bold* in all three configurations, based on:
 - keep version from redist (where there is one), since that's the version we 
were shipping with and used in production
 - keep the latest version in case of commons-pool that is not part of the 
redist config

To achieve this we should exclude these artifacts from the transitive 
dependencies and define them explicitly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3317) org.apache.sqoop.validation.RowCountValidator in live RDBMS system

2018-05-04 Thread Daniel Voros (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463723#comment-16463723
 ] 

Daniel Voros commented on SQOOP-3317:
-

Hi [~srikumaran.t], thank you for reporting this!

As far as I can tell, currently the only option for validation is to check for 
an exact match for the number of records. "Percentage tolerant" validation was 
only mentioned in the documentation but is not implemented.

In my opinion this kind of validation (comparing the number of records) doesn't 
make much sense and should only be used as a sanity check, since it doesn't 
guarantee the equality of the contents.

However we could improve the existing implementation by introducing another 
parameter (margin/threshold) to not require an exact match and we could also 
implement "Percentage tolerant".

> org.apache.sqoop.validation.RowCountValidator in live RDBMS system
> --
>
> Key: SQOOP-3317
> URL: https://issues.apache.org/jira/browse/SQOOP-3317
> Project: Sqoop
>  Issue Type: Bug
>Reporter: Sri Kumaran Thirupathy
>Priority: Major
>
> org.apache.sqoop.validation.RowCountValidator is retrieving count from Source 
> after the MR completes. This fails in live RDBMS case.
> org.apache.sqoop.validation.RowCountValidator can retrive count during MR 
> execution phase.  
> Also, How to use Percentage Tolerant? Reference: 
> [https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3321) TestHiveImport is failing on Jenkins

2018-05-04 Thread Daniel Voros (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463608#comment-16463608
 ] 

Daniel Voros commented on SQOOP-3321:
-

[~BoglarkaEgyed] this is failing for me on Linux as well. I believe this is due 
to case sensitivity of file names there (as opposed to MacOS). The table name 
gets converted to lowercase when importing but we're referring to it with it's 
original casing when trying to verify its contents in {{ParquetReader}}.

Tests are passing after converting these three table names to all lowercase in 
TestHiveImport:
 - APPEND_HIVE_IMPORT_AS_PARQUET
 - NORMAL_HIVE_IMPORT_AS_PARQUET
 - CREATE_OVERWRITE_HIVE_IMPORT_AS_PARQUET

Since SQOOP-3318 only changed the tests, I think we should adapt to the 
lowercase names in the tests too. Easiest solution would be to use lowercase 
names. What do you think [~vasas]?

> TestHiveImport is failing on Jenkins
> 
>
> Key: SQOOP-3321
> URL: https://issues.apache.org/jira/browse/SQOOP-3321
> Project: Sqoop
>  Issue Type: Bug
>Affects Versions: 1.4.7
>Reporter: Boglarka Egyed
>Priority: Major
> Attachments: TEST-org.apache.sqoop.hive.TestHiveImport.txt
>
>
> org.apache.sqoop.hive.TestHiveImport is failing since 
> [SQOOP-3318|https://reviews.apache.org/r/66761/bugs/SQOOP-3318/] has been 
> committed. This test seem to be failing only in the Jenkins environment as it 
> pass on several local machines. There can be some difference in the 
> filesystem which may cause this issue, it shall be investigated. I am 
> attaching the log from a failed run.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)