from:"medale"

[GitHub] spark pull request: [SPARK-3039] [BUILD] Spark assembly for new ha...

2015-02-03 Thread medale

Github user medale commented on the pull request:

https://github.com/apache/spark/pull/4315#issuecomment-72785613
  
The problem was that the Spark project hive-exec 0.13.1a depends on

```
dependency
groupIdorg.apache.avro/groupId
artifactIdavro-mapred/artifactId
version${avro.version}/version
/dependency
```

(see 
http://central.maven.org/maven2/org/spark-project/hive/hive-exec/0.13.1a/hive-exec-0.13.1a.pom)

Its parent defines avro.version as 1.7.5

avro.version1.7.5/avro.version

(see 
http://central.maven.org/maven2/org/spark-project/hive/hive/0.13.1a/hive-0.13.1a.pom)

The only place hive-exec is being used as a dependency is in:

find . -name pom.xml | xargs grep hive-exec
pom.xml (where we define it in dependencyManagement section)
sql/hive/pom.xml (in actual dependencies)

In sql/hive/pom.xml we also explicitly have dependency on:

```
   dependency
  groupIdorg.apache.avro/groupId
  artifactIdavro-mapred/artifactId
  classifier${avro.mapred.classifier}/classifier
/dependency
```

Therefore if we choose a profile that does not define avro.mapred.classifier
this field is left empty (see main pom.xml 
avro.mapred.classifier/avro.mapred.classifier).
We pull: avro-mapred-1.7.6.jar (exact same as 
avro-mapred-1.7.6-hadoop1.jar) as it should be.

If we choose a profile like hadoop-2.4 we set it to hadoop2 and pull:
avro-mapred-1.7.6-hadoop2.jar as it should be.

```
profile
  idhadoop-2.4/id
  properties
hadoop.version2.4.0/hadoop.version
protobuf.version2.5.0/protobuf.version
jets3t.version0.9.0/jets3t.version
hbase.version0.98.7-hadoop2/hbase.version
commons.math3.version3.1.1/commons.math3.version
avro.mapred.classifierhadoop2/avro.mapred.classifier
  /properties
/profile
```

However, with changes in 1.3.0-SNAPSHOT the avro-mapred's scope is newly 
defined as:

```
 dependency
groupIdorg.apache.avro/groupId
artifactIdavro-mapred/artifactId
version${avro.version}/version
classifier${avro.mapred.classifier}/classifier
scope${hive.deps.scope}/scope
```

That scope is in main pom.xml:
hive.deps.scopecompile/hive.deps.scope 

However, with changes in 1.3.0-SNAPSHOT the avro-mapred's scope is newly 
defined as:

```
 dependency
groupIdorg.apache.avro/groupId
artifactIdavro-mapred/artifactId
version${avro.version}/version
classifier${avro.mapred.classifier}/classifier
scope${hive.deps.scope}/scope
```

That scope is in main pom.xml:
hive.deps.scopecompile/hive.deps.scope 
assembly/pom.xml:hive.deps.scopeprovided/hive.deps.scope
examples/pom.xml:hive.deps.scopeprovided/hive.deps.scope

Same for hive-exec. So competing avro-mapred classes will no longer be 
included in the spark-assembly.jar. They are not included on the Hadoop 
classpath (only Avro), so they need to be supplied by the job. That will be new 
for Avro users. But excluding the hive-exec dependency and explicitly 
specifying avro-mapred to be only 1.7.6 with the correct classifier will be 
necessary if anything like maven enforcer is ever run. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3039] [BUILD] Spark assembly for new ha...

2015-02-02 Thread medale

GitHub user medale opened a pull request:

https://github.com/apache/spark/pull/4315

[SPARK-3039] [BUILD] Spark assembly for new hadoop API (hadoop 2) contai...

...ns avro-mapred for

hadoop 1 API had been marked as resolved but did not work for at least some
builds due to version conflicts using avro-mapred-1.7.5.jar and
avro-mapred-1.7.6-hadoop2.jar (the correct version) when building for 
hadoop2.

sql/hive/pom.xml org.spark-project.hive:hive-exec's depends on 1.7.5:

Building Spark Project Hive 1.2.0
[INFO] 

[INFO]
[INFO] --- maven-dependency-plugin:2.4:tree (default-cli) @ spark-hive_2.10 
---
[INFO] org.apache.spark:spark-hive_2.10:jar:1.2.0
[INFO] +- org.spark-project.hive:hive-exec:jar:0.13.1a:compile
[INFO] |  \- org.apache.avro:avro-mapred:jar:1.7.5:compile
[INFO] \- org.apache.avro:avro-mapred:jar:hadoop2:1.7.6:compile
[INFO]

Excluding this dependency allows the explicitly listed avro-mapred 
dependency
to be picked up.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/medale/spark avro-hadoop2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4315.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4315


commit 51b9c2a99b8cf7e931fc84419905d2f0a23bce3d
Author: medale medal...@yahoo.com
Date:   2015-02-02T22:00:54Z

[SPARK-3039] [BUILD] Spark assembly for new hadoop API (hadoop 2) contains 
avro-mapred for
hadoop 1 API had been marked as resolved but did not work for at least some
builds due to version conflicts using avro-mapred-1.7.5.jar and
avro-mapred-1.7.6-hadoop2.jar (the correct version) when building for 
hadoop2.

sql/hive/pom.xml org.spark-project.hive:hive-exec's depends on 1.7.5:

Building Spark Project Hive 1.2.0
[INFO] 

[INFO]
[INFO] --- maven-dependency-plugin:2.4:tree (default-cli) @ spark-hive_2.10 
---
[INFO] org.apache.spark:spark-hive_2.10:jar:1.2.0
[INFO] +- org.spark-project.hive:hive-exec:jar:0.13.1a:compile
[INFO] |  \- org.apache.avro:avro-mapred:jar:1.7.5:compile
[INFO] \- org.apache.avro:avro-mapred:jar:hadoop2:1.7.6:compile
[INFO]

Excluding this dependency allows the explicitly listed avro-mapred 
dependency
to be picked up.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Sbt and Maven builds pass on Linux boxes with ...

2014-10-24 Thread medale

Github user medale commented on the pull request:

https://github.com/apache/spark/pull/2883#issuecomment-60467372
  
I ran into the exact same file name too long problem when running the 
Maven build on an encrypted ext4 partition. The changes in the pom.xml file in 
this pull request solved the problem and it now builds fine. Thank you for 
sharing!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3039] [BUILD] Spark assembly for new ha...

[GitHub] spark pull request: [SPARK-3039] [BUILD] Spark assembly for new ha...

[GitHub] spark pull request: Sbt and Maven builds pass on Linux boxes with ...

3 matches

Site Navigation

Mail list logo

Footer information