Re: [VOTE] Release Apache Spark 1.4.0 (RC4)

2015-06-04 Thread Andrew Or
+1 (binding)

Ran the same tests I did for RC3:

Tested the standalone cluster mode REST submission gateway - submit /
status / kill
Tested simple applications on YARN client / cluster modes with and without
--jars
Tested python applications on YARN client / cluster modes with and without
--py-files*
Tested dynamic allocation on YARN client / cluster modes**

All good. A couple of known issues:

*SPARK-8017: YARN cluster python --py-files not working - not a blocker
(new feature)
** SPARK-8088: noisy output when min executors is set - not a blocker
(output can be disabled)

2015-06-04 13:35 GMT-07:00 Matei Zaharia matei.zaha...@gmail.com:

 +1

 Tested on Mac OS X

  On Jun 4, 2015, at 1:09 PM, Patrick Wendell pwend...@gmail.com wrote:
 
  I will give +1 as well.
 
  On Wed, Jun 3, 2015 at 11:59 PM, Reynold Xin r...@databricks.com
 wrote:
  Let me give you the 1st
 
  +1
 
 
 
  On Tue, Jun 2, 2015 at 10:47 PM, Patrick Wendell pwend...@gmail.com
 wrote:
 
  He all - a tiny nit from the last e-mail. The tag is v1.4.0-rc4. The
  exact commit and all other information is correct. (thanks Shivaram
  who pointed this out).
 
  On Tue, Jun 2, 2015 at 8:53 PM, Patrick Wendell pwend...@gmail.com
  wrote:
  Please vote on releasing the following candidate as Apache Spark
 version
  1.4.0!
 
  The tag to be voted on is v1.4.0-rc3 (commit 22596c5):
  https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
  22596c534a38cfdda91aef18aa9037ab101e4251
 
  The release files, including signatures, digests, etc. can be found
 at:
 
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc4-bin/
 
  Release artifacts are signed with the following key:
  https://people.apache.org/keys/committer/pwendell.asc
 
  The staging repository for this release can be found at:
  [published as version: 1.4.0]
 
 https://repository.apache.org/content/repositories/orgapachespark-/
  [published as version: 1.4.0-rc4]
 
 https://repository.apache.org/content/repositories/orgapachespark-1112/
 
  The documentation corresponding to this release can be found at:
 
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc4-docs/
 
  Please vote on releasing this package as Apache Spark 1.4.0!
 
  The vote is open until Saturday, June 06, at 05:00 UTC and passes
  if a majority of at least 3 +1 PMC votes are cast.
 
  [ ] +1 Release this package as Apache Spark 1.4.0
  [ ] -1 Do not release this package because ...
 
  To learn more about Apache Spark, please see
  http://spark.apache.org/
 
  == What has changed since RC3 ==
  In addition to may smaller fixes, three blocker issues were fixed:
  4940630 [SPARK-8020] [SQL] Spark SQL conf in spark-defaults.conf make
  metadataHive get constructed too early
  6b0f615 [SPARK-8038] [SQL] [PYSPARK] fix Column.when() and otherwise()
  78a6723 [SPARK-7978] [SQL] [PYSPARK] DecimalType should not be
 singleton
 
  == How can I help test this release? ==
  If you are a Spark user, you can help us test this release by
  taking a Spark 1.3 workload and running on this release candidate,
  then reporting any regressions.
 
  == What justifies a -1 vote for this release? ==
  This vote is happening towards the end of the 1.4 QA period,
  so -1 votes should only occur for significant regressions from 1.3.1.
  Bugs already present in 1.3.X, minor regressions, or bugs related
  to new features will not block this release.
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 
 
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 


 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




Re: [VOTE] Release Apache Spark 1.4.0 (RC4)

2015-06-04 Thread Calvin Jia
+1

Tested with input from Tachyon and persist off heap.

On Thu, Jun 4, 2015 at 6:26 PM, Timothy Chen tnac...@gmail.com wrote:

 +1

 Been testing cluster mode and client mode with mesos with 6 nodes cluster.

 Everything works so far.

 Tim

 On Jun 4, 2015, at 5:47 PM, Andrew Or and...@databricks.com wrote:

 +1 (binding)

 Ran the same tests I did for RC3:

 Tested the standalone cluster mode REST submission gateway - submit /
 status / kill
 Tested simple applications on YARN client / cluster modes with and without
 --jars
 Tested python applications on YARN client / cluster modes with and without
 --py-files*
 Tested dynamic allocation on YARN client / cluster modes**

 All good. A couple of known issues:

 *SPARK-8017: YARN cluster python --py-files not working - not a blocker
 (new feature)
 ** SPARK-8088: noisy output when min executors is set - not a blocker
 (output can be disabled)

 2015-06-04 13:35 GMT-07:00 Matei Zaharia matei.zaha...@gmail.com:

 +1

 Tested on Mac OS X

  On Jun 4, 2015, at 1:09 PM, Patrick Wendell pwend...@gmail.com wrote:
 
  I will give +1 as well.
 
  On Wed, Jun 3, 2015 at 11:59 PM, Reynold Xin r...@databricks.com
 wrote:
  Let me give you the 1st
 
  +1
 
 
 
  On Tue, Jun 2, 2015 at 10:47 PM, Patrick Wendell pwend...@gmail.com
 wrote:
 
  He all - a tiny nit from the last e-mail. The tag is v1.4.0-rc4. The
  exact commit and all other information is correct. (thanks Shivaram
  who pointed this out).
 
  On Tue, Jun 2, 2015 at 8:53 PM, Patrick Wendell pwend...@gmail.com
  wrote:
  Please vote on releasing the following candidate as Apache Spark
 version
  1.4.0!
 
  The tag to be voted on is v1.4.0-rc3 (commit 22596c5):
  https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
  22596c534a38cfdda91aef18aa9037ab101e4251
 
  The release files, including signatures, digests, etc. can be found
 at:
 
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc4-bin/
 
  Release artifacts are signed with the following key:
  https://people.apache.org/keys/committer/pwendell.asc
 
  The staging repository for this release can be found at:
  [published as version: 1.4.0]
 
 https://repository.apache.org/content/repositories/orgapachespark-/
  [published as version: 1.4.0-rc4]
 
 https://repository.apache.org/content/repositories/orgapachespark-1112/
 
  The documentation corresponding to this release can be found at:
 
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc4-docs/
 
  Please vote on releasing this package as Apache Spark 1.4.0!
 
  The vote is open until Saturday, June 06, at 05:00 UTC and passes
  if a majority of at least 3 +1 PMC votes are cast.
 
  [ ] +1 Release this package as Apache Spark 1.4.0
  [ ] -1 Do not release this package because ...
 
  To learn more about Apache Spark, please see
  http://spark.apache.org/
 
  == What has changed since RC3 ==
  In addition to may smaller fixes, three blocker issues were fixed:
  4940630 [SPARK-8020] [SQL] Spark SQL conf in spark-defaults.conf make
  metadataHive get constructed too early
  6b0f615 [SPARK-8038] [SQL] [PYSPARK] fix Column.when() and
 otherwise()
  78a6723 [SPARK-7978] [SQL] [PYSPARK] DecimalType should not be
 singleton
 
  == How can I help test this release? ==
  If you are a Spark user, you can help us test this release by
  taking a Spark 1.3 workload and running on this release candidate,
  then reporting any regressions.
 
  == What justifies a -1 vote for this release? ==
  This vote is happening towards the end of the 1.4 QA period,
  so -1 votes should only occur for significant regressions from 1.3.1.
  Bugs already present in 1.3.X, minor regressions, or bugs related
  to new features will not block this release.
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 
 
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 


 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org





Re: Anyone facing problem in incremental building of individual project

2015-06-04 Thread Meethu Mathew
Hi,
I added
​
 createDependencyReducedPom in my pom.xml and  the problem is solved.
!-- Work around MSHADE-148 --
+
​​
 createDependencyReducedPomfalse/createDependencyReducedPom

​Thank you @Steve​  and @Ted


Regards,

Meethu Mathew
Senior Engineer
Flytxt
On Thu, Jun 4, 2015 at 9:51 PM, Ted Yu yuzhih...@gmail.com wrote:

 Andrew Or put in this workaround :

 diff --git a/pom.xml b/pom.xml
 index 0b1aaad..d03d33b 100644
 --- a/pom.xml
 +++ b/pom.xml
 @@ -1438,6 +1438,8 @@
  version2.3/version
  configuration
shadedArtifactAttachedfalse/shadedArtifactAttached
 +  !-- Work around MSHADE-148 --
 +  createDependencyReducedPomfalse/createDependencyReducedPom
artifactSet
  includes
!-- At a minimum we must include this to force effective
 pom generation --

 FYI

 On Thu, Jun 4, 2015 at 6:25 AM, Steve Loughran ste...@hortonworks.com
 wrote:


  On 4 Jun 2015, at 11:16, Meethu Mathew meethu.mat...@flytxt.com wrote:

  Hi all,

  ​I added some new code to MLlib. When I am trying to build only the
 mllib project using  *mvn --projects mllib/ -DskipTests clean install*
 *​ *after setting
  export S
 PARK_PREPEND_CLASSES=true
 ​, the build is getting stuck with the following message.



  Excluding org.jpmml:pmml-schema:jar:1.1.15 from the shaded jar.
 [INFO] Excluding com.sun.xml.bind:jaxb-impl:jar:2.2.7 from the shaded
 jar.
 [INFO] Excluding com.sun.xml.bind:jaxb-core:jar:2.2.7 from the shaded
 jar.
 [INFO] Excluding javax.xml.bind:jaxb-api:jar:2.2.7 from the shaded jar.
 [INFO] Including org.spark-project.spark:unused:jar:1.0.0 in the shaded
 jar.
 [INFO] Excluding org.scala-lang:scala-reflect:jar:2.10.4 from the shaded
 jar.
 [INFO] Replacing original artifact with shaded artifact.
 [INFO] Replacing
 /home/meethu/git/FlytxtRnD/spark/mllib/target/spark-mllib_2.10-1.4.0-SNAPSHOT.jar
 with
 /home/meethu/git/FlytxtRnD/spark/mllib/target/spark-mllib_2.10-1.4.0-SNAPSHOT-shaded.jar
 [INFO] Dependency-reduced POM written at:
 /home/meethu/git/FlytxtRnD/spark/mllib/dependency-reduced-pom.xml
 [INFO] Dependency-reduced POM written at:
 /home/meethu/git/FlytxtRnD/spark/mllib/dependency-reduced-pom.xml
 [INFO] Dependency-reduced POM written at:
 /home/meethu/git/FlytxtRnD/spark/mllib/dependency-reduced-pom.xml
 [INFO] Dependency-reduced POM written at:
 /home/meethu/git/FlytxtRnD/spark/mllib/dependency-reduced-pom.xml

.



  I've seen something similar in a different build,

  It looks like MSHADE-148:
 https://issues.apache.org/jira/browse/MSHADE-148
 if you apply Tom White's patch, does your problem go away?





PySpark on PyPi

2015-06-04 Thread Olivier Girardot
Hi everyone,
Considering the python API as just a front needing the SPARK_HOME defined
anyway, I think it would be interesting to deploy the Python part of Spark
on PyPi in order to handle the dependencies in a Python project needing
PySpark via pip.

For now I just symlink the python/pyspark in my python install dir
site-packages/ in order for PyCharm or other lint tools to work properly.
I can do the setup.py work or anything.

What do you think ?

Regards,

Olivier.


Re: Ivy support in Spark vs. sbt

2015-06-04 Thread Eron Wright
I saw something like this last night, with a similar message.  Is this what 
you’re referring to?

[error] 
org.deeplearning4j#dl4j-spark-ml;0.0.3.3.4.alpha1-SNAPSHOT!dl4j-spark-ml.jar 
origin location must be absolute: 
file:/Users/eron/.m2/repository/org/deeplearning4j/dl4j-spark-ml/0.0.3.3.4.alpha1-SNAPSHOT/dl4j-spark-ml-0.0.3.3.4.alpha1-SNAPSHOT.jar
java.lang.IllegalArgumentException: 
org.deeplearning4j#dl4j-spark-ml;0.0.3.3.4.alpha1-SNAPSHOT!dl4j-spark-ml.jar 
origin location must be absolute: 
file:/Users/eron/.m2/repository/org/deeplearning4j/dl4j-spark-ml/0.0.3.3.4.alpha1-SNAPSHOT/dl4j-spark-ml-0.0.3.3.4.alpha1-SNAPSHOT.jar
at org.apache.ivy.util.Checks.checkAbsolute(Checks.java:57)
at 
org.apache.ivy.core.cache.DefaultRepositoryCacheManager.getArchiveFileInCache(DefaultRepositoryCacheManager.java:385)
at 
org.apache.ivy.core.cache.DefaultRepositoryCacheManager.download(DefaultRepositoryCacheManager.java:849)
at 
org.apache.ivy.plugins.resolver.BasicResolver.download(BasicResolver.java:835)
at 
org.apache.ivy.plugins.resolver.RepositoryResolver.download(RepositoryResolver.java:282)
at 
org.apache.ivy.plugins.resolver.ChainResolver.download(ChainResolver.java:219)
at 
org.apache.ivy.plugins.resolver.ChainResolver.download(ChainResolver.java:219)
at 
org.apache.ivy.core.resolve.ResolveEngine.downloadArtifacts(ResolveEngine.java:388)
at 
org.apache.ivy.core.resolve.ResolveEngine.resolve(ResolveEngine.java:331)
at org.apache.ivy.Ivy.resolve(Ivy.java:517)
at sbt.IvyActions$.sbt$IvyActions$$resolve(IvyActions.scala:266)
at sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:175)
at sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:157)
at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:151)
at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:151)
at sbt.IvySbt$$anonfun$withIvy$1.apply(Ivy.scala:128)
at sbt.IvySbt.sbt$IvySbt$$action$1(Ivy.scala:56)
at sbt.IvySbt$$anon$4.call(Ivy.scala:64)
at xsbt.boot.Locks$GlobalLock.withChannel$1(Locks.scala:93)
at 
xsbt.boot.Locks$GlobalLock.xsbt$boot$Locks$GlobalLock$$withChannelRetries$1(Locks.scala:78)
at 
xsbt.boot.Locks$GlobalLock$$anonfun$withFileLock$1.apply(Locks.scala:97)
at xsbt.boot.Using$.withResource(Using.scala:10)
at xsbt.boot.Using$.apply(Using.scala:9)
at xsbt.boot.Locks$GlobalLock.ignoringDeadlockAvoided(Locks.scala:58)
at xsbt.boot.Locks$GlobalLock.withLock(Locks.scala:48)
at xsbt.boot.Locks$.apply0(Locks.scala:31)
at xsbt.boot.Locks$.apply(Locks.scala:28)
at sbt.IvySbt.withDefaultLogger(Ivy.scala:64)
at sbt.IvySbt.withIvy(Ivy.scala:123)
at sbt.IvySbt.withIvy(Ivy.scala:120)
at sbt.IvySbt$Module.withModule(Ivy.scala:151)
at sbt.IvyActions$.updateEither(IvyActions.scala:157)
at 
sbt.Classpaths$$anonfun$sbt$Classpaths$$work$1$1.apply(Defaults.scala:1318)
at 
sbt.Classpaths$$anonfun$sbt$Classpaths$$work$1$1.apply(Defaults.scala:1315)
at 
sbt.Classpaths$$anonfun$doWork$1$1$$anonfun$85.apply(Defaults.scala:1345)
at 
sbt.Classpaths$$anonfun$doWork$1$1$$anonfun$85.apply(Defaults.scala:1343)
at sbt.Tracked$$anonfun$lastOutput$1.apply(Tracked.scala:35)
at sbt.Classpaths$$anonfun$doWork$1$1.apply(Defaults.scala:1348)
at sbt.Classpaths$$anonfun$doWork$1$1.apply(Defaults.scala:1342)
at sbt.Tracked$$anonfun$inputChanged$1.apply(Tracked.scala:45)
at sbt.Classpaths$.cachedUpdate(Defaults.scala:1360)
at sbt.Classpaths$$anonfun$updateTask$1.apply(Defaults.scala:1300)
at sbt.Classpaths$$anonfun$updateTask$1.apply(Defaults.scala:1275)
at scala.Function1$$anonfun$compose$1.apply(Function1.scala:47)
at sbt.$tilde$greater$$anonfun$$u2219$1.apply(TypeFunctions.scala:40)
at sbt.std.Transform$$anon$4.work(System.scala:63)
at 
sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:226)
at 
sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:226)
at sbt.ErrorHandling$.wideConvert(ErrorHandling.scala:17)
at sbt.Execute.work(Execute.scala:235)
at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:226)
at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:226)
at 
sbt.ConcurrentRestrictions$$anon$4$$anonfun$1.apply(ConcurrentRestrictions.scala:159)
at sbt.CompletionService$$anon$2.call(CompletionService.scala:28)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 

Re: Ivy support in Spark vs. sbt

2015-06-04 Thread shane knapp
interesting...  i definitely haven't seen it happen that often in our build
system, and when it has happened, i wasn't able to determine the cause.

On Thu, Jun 4, 2015 at 10:16 AM, Marcelo Vanzin van...@cloudera.com wrote:

 On Thu, Jun 4, 2015 at 10:04 AM, shane knapp skn...@berkeley.edu wrote:

 this has occasionally happened on our jenkins as well (twice since last
 august), and deleting the cache fixes it right up.


 Yes deleting the cache fixes things, but it's kinda annoying to have to do
 that. And yesterday when I was testing a patch that actually used the ivy
 feature, I had to do that multiple times... that slows things down a lot.



 On Thu, Jun 4, 2015 at 4:29 AM, Sean Owen so...@cloudera.com wrote:

 I've definitely seen the dependency path must be relative problem,
 and fixed it by deleting the ivy cache, but I don't know more than
 this.

 On Thu, Jun 4, 2015 at 1:33 AM, Marcelo Vanzin van...@cloudera.com
 wrote:
  Hey all,
 
  I've been bit by something really weird lately and I'm starting to
 think
  it's related to the ivy support we have in Spark, and running unit
 tests
  that use that code.
 
  The first thing that happens is that after running unit tests,
 sometimes my
  sbt builds start failing with error saying something about dependency
 path
  must be relative (sorry, don't have the exact error around). The
 dependency
  path it prints is a file: URL.
 
  I have a feeling that this is because Spark uses Ivy 2.4 while sbt
 uses Ivy
  2.3, and those might be incompatible. So if they get mixed up, things
 can
  break.
 
  The second is that sometimes unit tests fail with some weird error
  downloading dependencies. When checking the ivy metadata in
 ~/.ivy2/cache,
  the offending dependencies are pointing to my local maven repo (I have
  maven-local as one of the entries in my ~/.sbt/repositories).
 
  My feeling in this case is that Spark's version of Ivy somehow doesn't
  handle that case.
 
  So, long story short:
 
  - Has anyone run into either of these problems?
  - Is it possible to set some env variable or something during tests to
 force
  them to use their own directory instead of messing up and breaking my
  ~/.ivy2?
 
 
  --
  Marcelo

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org





 --
 Marcelo



Re: Ivy support in Spark vs. sbt

2015-06-04 Thread Marcelo Vanzin
On Thu, Jun 4, 2015 at 10:04 AM, shane knapp skn...@berkeley.edu wrote:

 this has occasionally happened on our jenkins as well (twice since last
 august), and deleting the cache fixes it right up.


Yes deleting the cache fixes things, but it's kinda annoying to have to do
that. And yesterday when I was testing a patch that actually used the ivy
feature, I had to do that multiple times... that slows things down a lot.



 On Thu, Jun 4, 2015 at 4:29 AM, Sean Owen so...@cloudera.com wrote:

 I've definitely seen the dependency path must be relative problem,
 and fixed it by deleting the ivy cache, but I don't know more than
 this.

 On Thu, Jun 4, 2015 at 1:33 AM, Marcelo Vanzin van...@cloudera.com
 wrote:
  Hey all,
 
  I've been bit by something really weird lately and I'm starting to think
  it's related to the ivy support we have in Spark, and running unit tests
  that use that code.
 
  The first thing that happens is that after running unit tests,
 sometimes my
  sbt builds start failing with error saying something about dependency
 path
  must be relative (sorry, don't have the exact error around). The
 dependency
  path it prints is a file: URL.
 
  I have a feeling that this is because Spark uses Ivy 2.4 while sbt uses
 Ivy
  2.3, and those might be incompatible. So if they get mixed up, things
 can
  break.
 
  The second is that sometimes unit tests fail with some weird error
  downloading dependencies. When checking the ivy metadata in
 ~/.ivy2/cache,
  the offending dependencies are pointing to my local maven repo (I have
  maven-local as one of the entries in my ~/.sbt/repositories).
 
  My feeling in this case is that Spark's version of Ivy somehow doesn't
  handle that case.
 
  So, long story short:
 
  - Has anyone run into either of these problems?
  - Is it possible to set some env variable or something during tests to
 force
  them to use their own directory instead of messing up and breaking my
  ~/.ivy2?
 
 
  --
  Marcelo

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org





-- 
Marcelo


Re: Ivy support in Spark vs. sbt

2015-06-04 Thread Marcelo Vanzin
Here's one of the types of exceptions I get (this one when running
VersionsSuite from sql/hive):

[info] - 13: create client *** FAILED *** (1 second, 946 milliseconds)
[info]   java.lang.RuntimeException: [download failed:
org.apache.httpcomponents#httpclient;4.2.5!httpclient.jar, download failed:
commons-codec#commons-codec;1.4!commons-codec.jar]
[info]   at
org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:978)

This is the content of the ivy metadata file for that component:

#ivy cached data file for org.apache.httpcomponents#httpclient;4.2.5
#Thu Jun 04 13:26:10 PDT 2015
artifact\:ivy\#ivy\#xml\#1855381640.is-local=true
artifact\:ivy\#ivy\#xml\#1855381640.location=file\:/home/vanzin/.m2/repository/org/apache/httpcomponents/httpclient/4.2.5/httpclient-4.2.5.pom
artifact\:ivy\#ivy\#xml\#1855381640.exists=true
resolver=local-m2-cache
artifact\:httpclient\#pom.original\#pom\#-365933676.original=artifact\:httpclient\#pom.original\#pom\#-365933676
artifact\:ivy\#ivy\#xml\#1855381640.original=artifact\:httpclient\#pom.original\#pom\#-365933676
artifact.resolver=local-m2-cache
artifact\:httpclient\#pom.original\#pom\#-365933676.is-local=true
artifact\:httpclient\#pom.original\#pom\#-365933676.location=file\:/home/vanzin/.m2/repository/org/apache/httpcomponents/httpclient/4.2.5/httpclient-4.2.5.pom
artifact\:httpclient\#pom.original\#pom\#-365933676.exists=true


If I delete that file *and* the maven copy of those artifacts, then the
tests pass. But that's really annoying, since I have to use sbt and maven
for different things and I really like the fact that sbt can read the maven
cache directly.


On Thu, Jun 4, 2015 at 10:23 AM, shane knapp skn...@berkeley.edu wrote:

 interesting...  i definitely haven't seen it happen that often in our
 build system, and when it has happened, i wasn't able to determine the
 cause.

 On Thu, Jun 4, 2015 at 10:16 AM, Marcelo Vanzin van...@cloudera.com
 wrote:

 On Thu, Jun 4, 2015 at 10:04 AM, shane knapp skn...@berkeley.edu wrote:

 this has occasionally happened on our jenkins as well (twice since last
 august), and deleting the cache fixes it right up.


 Yes deleting the cache fixes things, but it's kinda annoying to have to
 do that. And yesterday when I was testing a patch that actually used the
 ivy feature, I had to do that multiple times... that slows things down a
 lot.



 On Thu, Jun 4, 2015 at 4:29 AM, Sean Owen so...@cloudera.com wrote:

 I've definitely seen the dependency path must be relative problem,
 and fixed it by deleting the ivy cache, but I don't know more than
 this.

 On Thu, Jun 4, 2015 at 1:33 AM, Marcelo Vanzin van...@cloudera.com
 wrote:
  Hey all,
 
  I've been bit by something really weird lately and I'm starting to
 think
  it's related to the ivy support we have in Spark, and running unit
 tests
  that use that code.
 
  The first thing that happens is that after running unit tests,
 sometimes my
  sbt builds start failing with error saying something about
 dependency path
  must be relative (sorry, don't have the exact error around). The
 dependency
  path it prints is a file: URL.
 
  I have a feeling that this is because Spark uses Ivy 2.4 while sbt
 uses Ivy
  2.3, and those might be incompatible. So if they get mixed up, things
 can
  break.
 
  The second is that sometimes unit tests fail with some weird error
  downloading dependencies. When checking the ivy metadata in
 ~/.ivy2/cache,
  the offending dependencies are pointing to my local maven repo (I have
  maven-local as one of the entries in my ~/.sbt/repositories).
 
  My feeling in this case is that Spark's version of Ivy somehow doesn't
  handle that case.
 
  So, long story short:
 
  - Has anyone run into either of these problems?
  - Is it possible to set some env variable or something during tests
 to force
  them to use their own directory instead of messing up and breaking my
  ~/.ivy2?
 
 
  --
  Marcelo

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org





 --
 Marcelo





-- 
Marcelo


Re: [VOTE] Release Apache Spark 1.4.0 (RC4)

2015-06-04 Thread Matei Zaharia
+1 

Tested on Mac OS X

 On Jun 4, 2015, at 1:09 PM, Patrick Wendell pwend...@gmail.com wrote:
 
 I will give +1 as well.
 
 On Wed, Jun 3, 2015 at 11:59 PM, Reynold Xin r...@databricks.com wrote:
 Let me give you the 1st
 
 +1
 
 
 
 On Tue, Jun 2, 2015 at 10:47 PM, Patrick Wendell pwend...@gmail.com wrote:
 
 He all - a tiny nit from the last e-mail. The tag is v1.4.0-rc4. The
 exact commit and all other information is correct. (thanks Shivaram
 who pointed this out).
 
 On Tue, Jun 2, 2015 at 8:53 PM, Patrick Wendell pwend...@gmail.com
 wrote:
 Please vote on releasing the following candidate as Apache Spark version
 1.4.0!
 
 The tag to be voted on is v1.4.0-rc3 (commit 22596c5):
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
 22596c534a38cfdda91aef18aa9037ab101e4251
 
 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc4-bin/
 
 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc
 
 The staging repository for this release can be found at:
 [published as version: 1.4.0]
 https://repository.apache.org/content/repositories/orgapachespark-/
 [published as version: 1.4.0-rc4]
 https://repository.apache.org/content/repositories/orgapachespark-1112/
 
 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc4-docs/
 
 Please vote on releasing this package as Apache Spark 1.4.0!
 
 The vote is open until Saturday, June 06, at 05:00 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.
 
 [ ] +1 Release this package as Apache Spark 1.4.0
 [ ] -1 Do not release this package because ...
 
 To learn more about Apache Spark, please see
 http://spark.apache.org/
 
 == What has changed since RC3 ==
 In addition to may smaller fixes, three blocker issues were fixed:
 4940630 [SPARK-8020] [SQL] Spark SQL conf in spark-defaults.conf make
 metadataHive get constructed too early
 6b0f615 [SPARK-8038] [SQL] [PYSPARK] fix Column.when() and otherwise()
 78a6723 [SPARK-7978] [SQL] [PYSPARK] DecimalType should not be singleton
 
 == How can I help test this release? ==
 If you are a Spark user, you can help us test this release by
 taking a Spark 1.3 workload and running on this release candidate,
 then reporting any regressions.
 
 == What justifies a -1 vote for this release? ==
 This vote is happening towards the end of the 1.4 QA period,
 so -1 votes should only occur for significant regressions from 1.3.1.
 Bugs already present in 1.3.X, minor regressions, or bugs related
 to new features will not block this release.
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org
 
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org
 


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Ivy support in Spark vs. sbt

2015-06-04 Thread shane knapp
this has occasionally happened on our jenkins as well (twice since last
august), and deleting the cache fixes it right up.

On Thu, Jun 4, 2015 at 4:29 AM, Sean Owen so...@cloudera.com wrote:

 I've definitely seen the dependency path must be relative problem,
 and fixed it by deleting the ivy cache, but I don't know more than
 this.

 On Thu, Jun 4, 2015 at 1:33 AM, Marcelo Vanzin van...@cloudera.com
 wrote:
  Hey all,
 
  I've been bit by something really weird lately and I'm starting to think
  it's related to the ivy support we have in Spark, and running unit tests
  that use that code.
 
  The first thing that happens is that after running unit tests, sometimes
 my
  sbt builds start failing with error saying something about dependency
 path
  must be relative (sorry, don't have the exact error around). The
 dependency
  path it prints is a file: URL.
 
  I have a feeling that this is because Spark uses Ivy 2.4 while sbt uses
 Ivy
  2.3, and those might be incompatible. So if they get mixed up, things can
  break.
 
  The second is that sometimes unit tests fail with some weird error
  downloading dependencies. When checking the ivy metadata in
 ~/.ivy2/cache,
  the offending dependencies are pointing to my local maven repo (I have
  maven-local as one of the entries in my ~/.sbt/repositories).
 
  My feeling in this case is that Spark's version of Ivy somehow doesn't
  handle that case.
 
  So, long story short:
 
  - Has anyone run into either of these problems?
  - Is it possible to set some env variable or something during tests to
 force
  them to use their own directory instead of messing up and breaking my
  ~/.ivy2?
 
 
  --
  Marcelo

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




Re: Anyone facing problem in incremental building of individual project

2015-06-04 Thread Ted Yu
Andrew Or put in this workaround :

diff --git a/pom.xml b/pom.xml
index 0b1aaad..d03d33b 100644
--- a/pom.xml
+++ b/pom.xml
@@ -1438,6 +1438,8 @@
 version2.3/version
 configuration
   shadedArtifactAttachedfalse/shadedArtifactAttached
+  !-- Work around MSHADE-148 --
+  createDependencyReducedPomfalse/createDependencyReducedPom
   artifactSet
 includes
   !-- At a minimum we must include this to force effective
pom generation --

FYI

On Thu, Jun 4, 2015 at 6:25 AM, Steve Loughran ste...@hortonworks.com
wrote:


  On 4 Jun 2015, at 11:16, Meethu Mathew meethu.mat...@flytxt.com wrote:

  Hi all,

  ​I added some new code to MLlib. When I am trying to build only the
 mllib project using  *mvn --projects mllib/ -DskipTests clean install*
 *​ *after setting
  export S
 PARK_PREPEND_CLASSES=true
 ​, the build is getting stuck with the following message.



  Excluding org.jpmml:pmml-schema:jar:1.1.15 from the shaded jar.
 [INFO] Excluding com.sun.xml.bind:jaxb-impl:jar:2.2.7 from the shaded jar.
 [INFO] Excluding com.sun.xml.bind:jaxb-core:jar:2.2.7 from the shaded jar.
 [INFO] Excluding javax.xml.bind:jaxb-api:jar:2.2.7 from the shaded jar.
 [INFO] Including org.spark-project.spark:unused:jar:1.0.0 in the shaded
 jar.
 [INFO] Excluding org.scala-lang:scala-reflect:jar:2.10.4 from the shaded
 jar.
 [INFO] Replacing original artifact with shaded artifact.
 [INFO] Replacing
 /home/meethu/git/FlytxtRnD/spark/mllib/target/spark-mllib_2.10-1.4.0-SNAPSHOT.jar
 with
 /home/meethu/git/FlytxtRnD/spark/mllib/target/spark-mllib_2.10-1.4.0-SNAPSHOT-shaded.jar
 [INFO] Dependency-reduced POM written at:
 /home/meethu/git/FlytxtRnD/spark/mllib/dependency-reduced-pom.xml
 [INFO] Dependency-reduced POM written at:
 /home/meethu/git/FlytxtRnD/spark/mllib/dependency-reduced-pom.xml
 [INFO] Dependency-reduced POM written at:
 /home/meethu/git/FlytxtRnD/spark/mllib/dependency-reduced-pom.xml
 [INFO] Dependency-reduced POM written at:
 /home/meethu/git/FlytxtRnD/spark/mllib/dependency-reduced-pom.xml

.



  I've seen something similar in a different build,

  It looks like MSHADE-148:
 https://issues.apache.org/jira/browse/MSHADE-148
 if you apply Tom White's patch, does your problem go away?



Re: Ivy support in Spark vs. sbt

2015-06-04 Thread Marcelo Vanzin
They're my local builds, so I wouldn't be able to send you any links... and
the error is generally from sbt, not the unit tests. But if there's any
info I can collect when I see the error, let me know.

I'll try spark.jars.ivy. I wonder if we should just set that to the
system properties in Spark's root pom.

On Thu, Jun 4, 2015 at 9:47 AM, Burak Yavuz brk...@gmail.com wrote:

 Hi Marcelo,

 This is interesting. Can you please send me links to any failing builds if
 you see that problem please. For now you can set a conf: `spark.jars.ivy`
 to use a path except `~/.ivy2` for Spark.

 Thanks,
 Burak

 On Thu, Jun 4, 2015 at 4:29 AM, Sean Owen so...@cloudera.com wrote:

 I've definitely seen the dependency path must be relative problem,
 and fixed it by deleting the ivy cache, but I don't know more than
 this.

 On Thu, Jun 4, 2015 at 1:33 AM, Marcelo Vanzin van...@cloudera.com
 wrote:
  Hey all,
 
  I've been bit by something really weird lately and I'm starting to think
  it's related to the ivy support we have in Spark, and running unit tests
  that use that code.
 
  The first thing that happens is that after running unit tests,
 sometimes my
  sbt builds start failing with error saying something about dependency
 path
  must be relative (sorry, don't have the exact error around). The
 dependency
  path it prints is a file: URL.
 
  I have a feeling that this is because Spark uses Ivy 2.4 while sbt uses
 Ivy
  2.3, and those might be incompatible. So if they get mixed up, things
 can
  break.
 
  The second is that sometimes unit tests fail with some weird error
  downloading dependencies. When checking the ivy metadata in
 ~/.ivy2/cache,
  the offending dependencies are pointing to my local maven repo (I have
  maven-local as one of the entries in my ~/.sbt/repositories).
 
  My feeling in this case is that Spark's version of Ivy somehow doesn't
  handle that case.
 
  So, long story short:
 
  - Has anyone run into either of these problems?
  - Is it possible to set some env variable or something during tests to
 force
  them to use their own directory instead of messing up and breaking my
  ~/.ivy2?
 
 
  --
  Marcelo

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org





-- 
Marcelo


Fwd: How to pass system properties in spark ?

2015-06-04 Thread Ashwin Shankar
Trying spark-dev mailing list to see if anyone knows.

-- Forwarded message --
From: Ashwin Shankar ashwinshanka...@gmail.com
Date: Wed, Jun 3, 2015 at 5:38 PM
Subject: How to pass system properties in spark ?
To: u...@spark.apache.org u...@spark.apache.org


Hi,
I'm trying to use property substitution in my log4j.properties, so that
I can choose where to write spark logs at runtime.
The problem is that, system property passed to spark shell
doesn't seem to getting propagated to log4j.

*Here is log4j.properites(partial) with a parameter 'spark.log.path' :*
log4j.appender.logFile=org.apache.log4j.FileAppender
log4j.appender.logFile.File=*${spark.log.path}*
log4j.appender.logFile.layout=org.apache.log4j.PatternLayout
log4j.appender.logFile.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p
%c{1}: %m%n

*Here is how I pass the 'spark.log.path' variable on command line :*
$spark-shell --conf
spark.driver.extraJavaOptions=-Dspark.log.path=/tmp/spark.log

I also tried :
$spark-shell -Dspark.log.path=/tmp/spark.log

*Result : */tmp*/*spark.log not getting created when I run spark.

Any ideas why this is happening ?

*When I enable log4j debug I see that following :*
log4j: Setting property [file] to [].
log4j: setFile called: , true
log4j:ERROR setFile(null,true) call failed.
java.io.FileNotFoundException:  (No such file or directory)
at java.io.FileOutputStream.open(Native Method)

-- 
Thanks,
Ashwin





-- 
Thanks,
Ashwin


Re: Ivy support in Spark vs. sbt

2015-06-04 Thread Burak Yavuz
Hi Marcelo,

This is interesting. Can you please send me links to any failing builds if
you see that problem please. For now you can set a conf: `spark.jars.ivy`
to use a path except `~/.ivy2` for Spark.

Thanks,
Burak

On Thu, Jun 4, 2015 at 4:29 AM, Sean Owen so...@cloudera.com wrote:

 I've definitely seen the dependency path must be relative problem,
 and fixed it by deleting the ivy cache, but I don't know more than
 this.

 On Thu, Jun 4, 2015 at 1:33 AM, Marcelo Vanzin van...@cloudera.com
 wrote:
  Hey all,
 
  I've been bit by something really weird lately and I'm starting to think
  it's related to the ivy support we have in Spark, and running unit tests
  that use that code.
 
  The first thing that happens is that after running unit tests, sometimes
 my
  sbt builds start failing with error saying something about dependency
 path
  must be relative (sorry, don't have the exact error around). The
 dependency
  path it prints is a file: URL.
 
  I have a feeling that this is because Spark uses Ivy 2.4 while sbt uses
 Ivy
  2.3, and those might be incompatible. So if they get mixed up, things can
  break.
 
  The second is that sometimes unit tests fail with some weird error
  downloading dependencies. When checking the ivy metadata in
 ~/.ivy2/cache,
  the offending dependencies are pointing to my local maven repo (I have
  maven-local as one of the entries in my ~/.sbt/repositories).
 
  My feeling in this case is that Spark's version of Ivy somehow doesn't
  handle that case.
 
  So, long story short:
 
  - Has anyone run into either of these problems?
  - Is it possible to set some env variable or something during tests to
 force
  them to use their own directory instead of messing up and breaking my
  ~/.ivy2?
 
 
  --
  Marcelo

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




Re: Where is the JIRA filter for new contributers?

2015-06-04 Thread Sean Owen
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark

... which contains ...

https://issues.apache.org/jira/browse/SPARK-7993?jql=project%20%3D%20SPARK%20AND%20labels%20%3D%20Starter%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)

On Thu, Jun 4, 2015 at 5:45 PM, Ravi Desai rd7...@gmail.com wrote:
 I am new to Spark and would like to contribute to it.  I recall seeing
 somewhere on the website a link to a JIRA filter for new contributers, but
 can't find that anymore.  Could someone point me to it?

 Thanks,
 -ravi

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Where is the JIRA filter for new contributers?

2015-06-04 Thread Ravi Desai
I am new to Spark and would like to contribute to it.  I recall seeing 
somewhere on the website a link to a JIRA filter for new contributers, 
but can't find that anymore.  Could someone point me to it?


Thanks,
-ravi

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Spark Packages: using sbt-spark-package tool with R

2015-06-04 Thread Chris Freeman
Hey everyone,

I’m looking to develop a package for use with SparkR. This package would 
include custom R and Scala code and I was wondering if anyone had any insight 
into how I might be able to use the sbt-spark-package tool to publish something 
that needs to include an R package as well as a JAR created via SBT assembly.  
I know there’s an existing option for including Python files but I haven’t been 
able to crack the code on how I might be able to include R files.

Any advice is appreciated!

-Chris Freeman



Re: [VOTE] Release Apache Spark 1.4.0 (RC4)

2015-06-04 Thread Reynold Xin
Let me give you the 1st

+1



On Tue, Jun 2, 2015 at 10:47 PM, Patrick Wendell pwend...@gmail.com wrote:

 He all - a tiny nit from the last e-mail. The tag is v1.4.0-rc4. The
 exact commit and all other information is correct. (thanks Shivaram
 who pointed this out).

 On Tue, Jun 2, 2015 at 8:53 PM, Patrick Wendell pwend...@gmail.com
 wrote:
  Please vote on releasing the following candidate as Apache Spark version
 1.4.0!
 
  The tag to be voted on is v1.4.0-rc3 (commit 22596c5):
  https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
  22596c534a38cfdda91aef18aa9037ab101e4251
 
  The release files, including signatures, digests, etc. can be found at:
  http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc4-bin/
 
  Release artifacts are signed with the following key:
  https://people.apache.org/keys/committer/pwendell.asc
 
  The staging repository for this release can be found at:
  [published as version: 1.4.0]
  https://repository.apache.org/content/repositories/orgapachespark-/
  [published as version: 1.4.0-rc4]
  https://repository.apache.org/content/repositories/orgapachespark-1112/
 
  The documentation corresponding to this release can be found at:
  http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc4-docs/
 
  Please vote on releasing this package as Apache Spark 1.4.0!
 
  The vote is open until Saturday, June 06, at 05:00 UTC and passes
  if a majority of at least 3 +1 PMC votes are cast.
 
  [ ] +1 Release this package as Apache Spark 1.4.0
  [ ] -1 Do not release this package because ...
 
  To learn more about Apache Spark, please see
  http://spark.apache.org/
 
  == What has changed since RC3 ==
  In addition to may smaller fixes, three blocker issues were fixed:
  4940630 [SPARK-8020] [SQL] Spark SQL conf in spark-defaults.conf make
  metadataHive get constructed too early
  6b0f615 [SPARK-8038] [SQL] [PYSPARK] fix Column.when() and otherwise()
  78a6723 [SPARK-7978] [SQL] [PYSPARK] DecimalType should not be singleton
 
  == How can I help test this release? ==
  If you are a Spark user, you can help us test this release by
  taking a Spark 1.3 workload and running on this release candidate,
  then reporting any regressions.
 
  == What justifies a -1 vote for this release? ==
  This vote is happening towards the end of the 1.4 QA period,
  so -1 votes should only occur for significant regressions from 1.3.1.
  Bugs already present in 1.3.X, minor regressions, or bugs related
  to new features will not block this release.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




RE: MLlib: Anybody working on hierarchical topic models like HLDA?

2015-06-04 Thread Yang, Yuhao
Hi DB Tsai,

Not for now. My primary reference is 
http://jmlr.csail.mit.edu/proceedings/papers/v15/wang11a/wang11a.pdf .

And I'm seeking a way to maximum code reuse. Any suggestion will be welcome. 
Thanks.

Regards,
yuhao

-Original Message-
From: DB Tsai [mailto:dbt...@dbtsai.com] 
Sent: Thursday, June 4, 2015 1:01 PM
To: Yang, Yuhao
Cc: Joseph Bradley; Lorenz Fischer; dev@spark.apache.org
Subject: Re: MLlib: Anybody working on hierarchical topic models like HLDA?

Is your HDP implementation based on distributed gibbs sampling? Thanks.

Sincerely,

DB Tsai
---
Blog: https://www.dbtsai.com


On Wed, Jun 3, 2015 at 8:13 PM, Yang, Yuhao yuhao.y...@intel.com wrote:
 Hi Lorenz,



   I’m trying to build a prototype of HDP for a customer based on the 
 current LDA implementations. An initial version will probably be ready 
 within the next one or two weeks. I’ll share it and hopefully we can join 
 forces.



   One concern is that I’m not sure how widely it will be used in the 
 industry or community. Hope it’s popular enough to be accepted by 
 Spark MLlib.



 http://www.cs.berkeley.edu/~jordan/papers/hierarchical-dp.pdf

 http://jmlr.csail.mit.edu/proceedings/papers/v15/wang11a/wang11a.pdf



 Regards,

 Yuhao



 From: Joseph Bradley [mailto:jos...@databricks.com]
 Sent: Thursday, June 4, 2015 7:17 AM
 To: Lorenz Fischer
 Cc: dev@spark.apache.org
 Subject: Re: MLlib: Anybody working on hierarchical topic models like HLDA?



 Hi Lorenz,



 I'm not aware of people working on hierarchical topic models for 
 MLlib, but that would be cool to see.  Hopefully other devs know more!



 Glad that the current LDA is helpful!



 Joseph



 On Wed, Jun 3, 2015 at 6:43 AM, Lorenz Fischer 
 lorenz.fisc...@gmail.com
 wrote:

 Hi All



 I'm working on a project in which I use the current LDA implementation 
 that has been contributed by Databricks' Joseph Bradley et al. for the 
 recent
 1.3.0 release (thanks guys!). While this is great, my project requires 
 several levels of topics, as I would like to offer users to drill down 
 into subtopics.



 As I understand it, Hierarchical Latent Dirichlet Allocation (HLDA) 
 would offer such a hierarchy. Looking at the papers and talks by Blei 
 [1,2] and Jordan [3], I think I should be able to implement HLDA in 
 Spark using the Nested Chinese Restaurant Process (NCRP). However, as 
 I have some time constraints, I'm not sure if I will have the time to do it 
 'the proper way'.



 In any case, I wanted to quickly ask around if anybody is already 
 working on this or on some other form of a hierarchical topic model. 
 Maybe I could contribute to these efforts instead of starting from scratch.



 Best,

 Lorenz



 [1] 
 http://www.cs.princeton.edu/~blei/papers/BleiGriffithsJordan2009.pdf

 [2]
 http://papers.nips.cc/paper/2466-hierarchical-topic-models-and-the-nes
 ted-chinese-restaurant-process.pdf

 [3] https://www.youtube.com/watch?v=PxgW3lOrj60




Anyone facing problem in incremental building of individual project

2015-06-04 Thread Meethu Mathew
Hi all,

​I added some new code to MLlib. When I am trying to build only the mllib
project using  *mvn --projects mllib/ -DskipTests clean install*
*​ *after setting
export S
PARK_PREPEND_CLASSES=true
​, the build is getting stuck with the following message.



  Excluding org.jpmml:pmml-schema:jar:1.1.15 from the shaded jar.
 [INFO] Excluding com.sun.xml.bind:jaxb-impl:jar:2.2.7 from the shaded jar.
 [INFO] Excluding com.sun.xml.bind:jaxb-core:jar:2.2.7 from the shaded jar.
 [INFO] Excluding javax.xml.bind:jaxb-api:jar:2.2.7 from the shaded jar.
 [INFO] Including org.spark-project.spark:unused:jar:1.0.0 in the shaded
 jar.
 [INFO] Excluding org.scala-lang:scala-reflect:jar:2.10.4 from the shaded
 jar.
 [INFO] Replacing original artifact with shaded artifact.
 [INFO] Replacing
 /home/meethu/git/FlytxtRnD/spark/mllib/target/spark-mllib_2.10-1.4.0-SNAPSHOT.jar
 with
 /home/meethu/git/FlytxtRnD/spark/mllib/target/spark-mllib_2.10-1.4.0-SNAPSHOT-shaded.jar
 [INFO] Dependency-reduced POM written at:
 /home/meethu/git/FlytxtRnD/spark/mllib/dependency-reduced-pom.xml
 [INFO] Dependency-reduced POM written at:
 /home/meethu/git/FlytxtRnD/spark/mllib/dependency-reduced-pom.xml
 [INFO] Dependency-reduced POM written at:
 /home/meethu/git/FlytxtRnD/spark/mllib/dependency-reduced-pom.xml
 [INFO] Dependency-reduced POM written at:
 /home/meethu/git/FlytxtRnD/spark/mllib/dependency-reduced-pom.xml

   .

​But  a full build completes as usual. Please help if anyone is facing the
same issue.

Regards,

Meethu Mathew
Senior Engineer
Flytxt


Re: Ivy support in Spark vs. sbt

2015-06-04 Thread Sean Owen
I've definitely seen the dependency path must be relative problem,
and fixed it by deleting the ivy cache, but I don't know more than
this.

On Thu, Jun 4, 2015 at 1:33 AM, Marcelo Vanzin van...@cloudera.com wrote:
 Hey all,

 I've been bit by something really weird lately and I'm starting to think
 it's related to the ivy support we have in Spark, and running unit tests
 that use that code.

 The first thing that happens is that after running unit tests, sometimes my
 sbt builds start failing with error saying something about dependency path
 must be relative (sorry, don't have the exact error around). The dependency
 path it prints is a file: URL.

 I have a feeling that this is because Spark uses Ivy 2.4 while sbt uses Ivy
 2.3, and those might be incompatible. So if they get mixed up, things can
 break.

 The second is that sometimes unit tests fail with some weird error
 downloading dependencies. When checking the ivy metadata in ~/.ivy2/cache,
 the offending dependencies are pointing to my local maven repo (I have
 maven-local as one of the entries in my ~/.sbt/repositories).

 My feeling in this case is that Spark's version of Ivy somehow doesn't
 handle that case.

 So, long story short:

 - Has anyone run into either of these problems?
 - Is it possible to set some env variable or something during tests to force
 them to use their own directory instead of messing up and breaking my
 ~/.ivy2?


 --
 Marcelo

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.4.0 (RC4)

2015-06-04 Thread Patrick Wendell
I will give +1 as well.

On Wed, Jun 3, 2015 at 11:59 PM, Reynold Xin r...@databricks.com wrote:
 Let me give you the 1st

 +1



 On Tue, Jun 2, 2015 at 10:47 PM, Patrick Wendell pwend...@gmail.com wrote:

 He all - a tiny nit from the last e-mail. The tag is v1.4.0-rc4. The
 exact commit and all other information is correct. (thanks Shivaram
 who pointed this out).

 On Tue, Jun 2, 2015 at 8:53 PM, Patrick Wendell pwend...@gmail.com
 wrote:
  Please vote on releasing the following candidate as Apache Spark version
  1.4.0!
 
  The tag to be voted on is v1.4.0-rc3 (commit 22596c5):
  https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
  22596c534a38cfdda91aef18aa9037ab101e4251
 
  The release files, including signatures, digests, etc. can be found at:
  http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc4-bin/
 
  Release artifacts are signed with the following key:
  https://people.apache.org/keys/committer/pwendell.asc
 
  The staging repository for this release can be found at:
  [published as version: 1.4.0]
  https://repository.apache.org/content/repositories/orgapachespark-/
  [published as version: 1.4.0-rc4]
  https://repository.apache.org/content/repositories/orgapachespark-1112/
 
  The documentation corresponding to this release can be found at:
  http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc4-docs/
 
  Please vote on releasing this package as Apache Spark 1.4.0!
 
  The vote is open until Saturday, June 06, at 05:00 UTC and passes
  if a majority of at least 3 +1 PMC votes are cast.
 
  [ ] +1 Release this package as Apache Spark 1.4.0
  [ ] -1 Do not release this package because ...
 
  To learn more about Apache Spark, please see
  http://spark.apache.org/
 
  == What has changed since RC3 ==
  In addition to may smaller fixes, three blocker issues were fixed:
  4940630 [SPARK-8020] [SQL] Spark SQL conf in spark-defaults.conf make
  metadataHive get constructed too early
  6b0f615 [SPARK-8038] [SQL] [PYSPARK] fix Column.when() and otherwise()
  78a6723 [SPARK-7978] [SQL] [PYSPARK] DecimalType should not be singleton
 
  == How can I help test this release? ==
  If you are a Spark user, you can help us test this release by
  taking a Spark 1.3 workload and running on this release candidate,
  then reporting any regressions.
 
  == What justifies a -1 vote for this release? ==
  This vote is happening towards the end of the 1.4 QA period,
  so -1 votes should only occur for significant regressions from 1.3.1.
  Bugs already present in 1.3.X, minor regressions, or bugs related
  to new features will not block this release.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Anyone facing problem in incremental building of individual project

2015-06-04 Thread Steve Loughran

On 4 Jun 2015, at 11:16, Meethu Mathew 
meethu.mat...@flytxt.commailto:meethu.mat...@flytxt.com wrote:

Hi all,

​I added some new code to MLlib. When I am trying to build only the mllib 
project using  mvn --projects mllib/ -DskipTests clean install
​ after setting
export S
PARK_PREPEND_CLASSES=true
​, the build is getting stuck with the following message.


 Excluding org.jpmml:pmml-schema:jar:1.1.15 from the shaded jar.
[INFO] Excluding com.sun.xml.bind:jaxb-impl:jar:2.2.7 from the shaded jar.
[INFO] Excluding com.sun.xml.bind:jaxb-core:jar:2.2.7 from the shaded jar.
[INFO] Excluding javax.xml.bind:jaxb-api:jar:2.2.7 from the shaded jar.
[INFO] Including org.spark-project.spark:unused:jar:1.0.0 in the shaded jar.
[INFO] Excluding org.scala-lang:scala-reflect:jar:2.10.4 from the shaded jar.
[INFO] Replacing original artifact with shaded artifact.
[INFO] Replacing 
/home/meethu/git/FlytxtRnD/spark/mllib/target/spark-mllib_2.10-1.4.0-SNAPSHOT.jar
 with 
/home/meethu/git/FlytxtRnD/spark/mllib/target/spark-mllib_2.10-1.4.0-SNAPSHOT-shaded.jar
[INFO] Dependency-reduced POM written at: 
/home/meethu/git/FlytxtRnD/spark/mllib/dependency-reduced-pom.xml
[INFO] Dependency-reduced POM written at: 
/home/meethu/git/FlytxtRnD/spark/mllib/dependency-reduced-pom.xml
[INFO] Dependency-reduced POM written at: 
/home/meethu/git/FlytxtRnD/spark/mllib/dependency-reduced-pom.xml
[INFO] Dependency-reduced POM written at: 
/home/meethu/git/FlytxtRnD/spark/mllib/dependency-reduced-pom.xml
   .


I've seen something similar in a different build,

It looks like MSHADE-148: https://issues.apache.org/jira/browse/MSHADE-148
if you apply Tom White's patch, does your problem go away?