Re: Too late to contribute for 1.1.0?
I believe docs changes can go in anytime (because we can just publish new versions of docs). Critical bug fixes can still go in too. On Thu, Aug 21, 2014 at 11:43 PM, Evan Chan velvia.git...@gmail.com wrote: I'm hoping to get in some doc enhancements and small bug fixes for Spark SQL. Also possibly a small new API to list the tables in sqlContext. Oh, and to get the doc page I had talked about before, a list of community Spark projects. thanks, Evan - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Adding support for a new object store
I am new to Spark source code and looking to see if i can add push-down support of spark filters to the storage (in my case an object store). I am willing to consider how this can be generically done for any store that we might want to integrate with spark. I am looking to know the areas that I should look into to provide support for a new data store in this context. Following below are some of the questions I have to start with: 1. Do we need to create a new RDD class for the new store that we want to support? From Spark Context, we create an RDD and the operations on data including the filter are performed through the RDD methods. 2. When we specify the code for filter task in the RDD.filter() method, how does it get communicated to the Executor on the data node? Does the Executor need to compile this code on the fly and execute it? or how does it work? ( I have looked at the code for sometime, but not yet got to figuring this out, so i am looking for some pointers that can help me come a little up-to-speed in this part of the code) 3. How long the Executor holds the memory? and how does it decide when to release the memory/cache? Thank you in advance. Regards, Rajendran. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
take() reads every partition if the first one is empty
On line 777 https://github.com/apache/spark/commit/42571d30d0d518e69eecf468075e4c5a823a2ae8#diff-1d55e54678eff2076263f2fe36150c17R771 the logic for take() reads ALL partitions if the first one (or first k) are empty. This has actually lead to OOMs when we had many partitions (thousands) and unfortunately the first one was empty. Wouldn't a better implementation strategy be numPartsToTry = partsScanned * 2 instead of numPartsToTry = totalParts - 1 (this doubling is similar to most memory allocation strategies) Thanks! - Paul -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/take-reads-every-partition-if-the-first-one-is-empty-tp7956.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: take() reads every partition if the first one is empty
Hi Paul, I agree that jumping straight from reading N rows from 1 partition to N rows from ALL partitions is pretty aggressive. The exponential growth strategy of doubling the partition count every time seems better -- 1, 2, 4, 8, 16, ... will be much more likely to prevent OOMs than the 1 - ALL strategy. Andrew On Fri, Aug 22, 2014 at 9:50 AM, pnepywoda pnepyw...@palantir.com wrote: On line 777 https://github.com/apache/spark/commit/42571d30d0d518e69eecf468075e4c5a823a2ae8#diff-1d55e54678eff2076263f2fe36150c17R771 the logic for take() reads ALL partitions if the first one (or first k) are empty. This has actually lead to OOMs when we had many partitions (thousands) and unfortunately the first one was empty. Wouldn't a better implementation strategy be numPartsToTry = partsScanned * 2 instead of numPartsToTry = totalParts - 1 (this doubling is similar to most memory allocation strategies) Thanks! - Paul -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/take-reads-every-partition-if-the-first-one-is-empty-tp7956.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: take() reads every partition if the first one is empty
Yep, anyone can create a bug at https://issues.apache.org/jira/browse/SPARK Then if you make a pull request on GitHub and have the bug number in the header like [SPARK-1234] Make take() less OOM-prone, then the PR gets linked to the Jira ticket. I think that's the best way to get feedback on a fix. On Fri, Aug 22, 2014 at 12:52 PM, pnepywoda pnepyw...@palantir.com wrote: What's the process at this point? Does someone make a bug? Should I make a bug? (do I even have permission to?) -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/take-reads-every-partition-if-the-first-one-is-empty-tp7956p7958.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Graphx GraphLoader Coalesce Shuffle
Hey all, I’ve often found that my spark programs run much more stable with a higher number of partitions, and a lot of the graphs I deal with will have a few hundred large part files. I was wondering if having a parameter in GraphLoader, defaulting to false, to set the shuffle parameter in coalesce is something that might be added to graphx, or if there was a good reason for not including it? I’ve been using this patch myself for a couple weeks. —Jeff diff --git a/graphx/src/main/scala/org/apache/spark/graphx/GraphLoader.scala b/graphx/src/main/scala/org/apache/spark/graphx/GraphLoader.scala index f4c7936..b2f9e9c 100644 --- a/graphx/src/main/scala/org/apache/spark/graphx/GraphLoader.scala +++ b/graphx/src/main/scala/org/apache/spark/graphx/GraphLoader.scala @@ -58,13 +58,14 @@ object GraphLoader extends Logging { canonicalOrientation: Boolean = false, minEdgePartitions: Int = 1, edgeStorageLevel: StorageLevel = StorageLevel.MEMORY_ONLY, - vertexStorageLevel: StorageLevel = StorageLevel.MEMORY_ONLY) + vertexStorageLevel: StorageLevel = StorageLevel.MEMORY_ONLY, + shuffle: Boolean = false) : Graph[Int, Int] = { val startTime = System.currentTimeMillis // Parse the edge data table directly into edge partitions -val lines = sc.textFile(path, minEdgePartitions).coalesce(minEdgePartitions) +val lines = sc.textFile(path, minEdgePartitions).coalesce(minEdgePartitions, shuffle) val edges = lines.mapPartitionsWithIndex { (pid, iter) = val builder = new EdgePartitionBuilder[Int, Int] iter.foreach { line = signature.asc Description: Message signed with OpenPGP using GPGMail
reference to dstream in package org.apache.spark.streaming which is not available
Hi, Using the following command on (refreshed) master branch: mvn clean package -DskipTests I got: constituent[36]: file:/homes/hortonzy/apache-maven-3.1.1/conf/logging/ --- java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289) at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229) at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415) at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356) Caused by: scala.reflect.internal.Types$TypeError: bad symbolic reference. A signature in TestSuiteBase.class refers to term dstream in package org.apache.spark.streaming which is not available. It may be completely missing from the current classpath, or the version on the classpath might be incompatible with the version used when compiling TestSuiteBase.class. at scala.reflect.internal.pickling.UnPickler$Scan.toTypeError(UnPickler.scala:847) at scala.reflect.internal.pickling.UnPickler$Scan$LazyTypeRef.complete(UnPickler.scala:854) at scala.reflect.internal.Symbols$Symbol.info(Symbols.scala:1231) at scala.reflect.internal.Types$TypeMap$$anonfun$noChangeToSymbols$1.apply(Types.scala:4280) at scala.reflect.internal.Types$TypeMap$$anonfun$noChangeToSymbols$1.apply(Types.scala:4280) at scala.collection.LinearSeqOptimized$class.forall(LinearSeqOptimized.scala:70) at scala.collection.immutable.List.forall(List.scala:84) at scala.reflect.internal.Types$TypeMap.noChangeToSymbols(Types.scala:4280) at scala.reflect.internal.Types$TypeMap.mapOver(Types.scala:4293) at scala.reflect.internal.Types$TypeMap.mapOver(Types.scala:4196) at scala.reflect.internal.Types$AsSeenFromMap.apply(Types.scala:4638) at scala.reflect.internal.Types$TypeMap.mapOver(Types.scala:4202) at scala.reflect.internal.Types$AsSeenFromMap.apply(Types.scala:4638) at scala.reflect.internal.Types$Type.asSeenFrom(Types.scala:754) at scala.reflect.internal.Types$Type.memberInfo(Types.scala:773) at xsbt.ExtractAPI.defDef(ExtractAPI.scala:224) at xsbt.ExtractAPI.xsbt$ExtractAPI$$definition(ExtractAPI.scala:315) at xsbt.ExtractAPI$$anonfun$xsbt$ExtractAPI$$processDefinitions$1.apply(ExtractAPI.scala:296) at xsbt.ExtractAPI$$anonfun$xsbt$ExtractAPI$$processDefinitions$1.apply(ExtractAPI.scala:296) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251) at scala.collection.mutable.ArrayOps$ofRef.flatMap(ArrayOps.scala:108) at xsbt.ExtractAPI.xsbt$ExtractAPI$$processDefinitions(ExtractAPI.scala:296) at xsbt.ExtractAPI$$anonfun$mkStructure$4.apply(ExtractAPI.scala:293) at xsbt.ExtractAPI$$anonfun$mkStructure$4.apply(ExtractAPI.scala:293) at xsbt.Message$$anon$1.apply(Message.scala:8) at xsbti.SafeLazy$$anonfun$apply$1.apply(SafeLazy.scala:8) at xsbti.SafeLazy$Impl._t$lzycompute(SafeLazy.scala:20) at xsbti.SafeLazy$Impl._t(SafeLazy.scala:18) at xsbti.SafeLazy$Impl.get(SafeLazy.scala:24) at xsbt.ExtractAPI$$anonfun$forceStructures$1.apply(ExtractAPI.scala:138) at xsbt.ExtractAPI$$anonfun$forceStructures$1.apply(ExtractAPI.scala:138) at scala.collection.immutable.List.foreach(List.scala:318) at xsbt.ExtractAPI.forceStructures(ExtractAPI.scala:138) at xsbt.ExtractAPI.forceStructures(ExtractAPI.scala:139) at xsbt.API$ApiPhase.processScalaUnit(API.scala:54) at xsbt.API$ApiPhase.processUnit(API.scala:38) at xsbt.API$ApiPhase$$anonfun$run$1.apply(API.scala:34) at xsbt.API$ApiPhase$$anonfun$run$1.apply(API.scala:34) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at xsbt.API$ApiPhase.run(API.scala:34) at scala.tools.nsc.Global$Run.compileUnitsInternal(Global.scala:1583) at scala.tools.nsc.Global$Run.compileUnits(Global.scala:1557) at scala.tools.nsc.Global$Run.compileSources(Global.scala:1553) at scala.tools.nsc.Global$Run.compile(Global.scala:1662) at xsbt.CachedCompiler0.run(CompilerInterface.scala:123) at xsbt.CachedCompiler0.run(CompilerInterface.scala:99) at xsbt.CompilerInterface.run(CompilerInterface.scala:27) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at
[Spark SQL] off-heap columnar store
Hey guys, What is the plan for getting Tachyon/off-heap support for the columnar compressed store? It's not in 1.1 is it? In particular: - being able to set TACHYON as the caching mode - loading of hot columns or all columns - write-through of columnar store data to HDFS or backing store - being able to start a context and query directly from Tachyon's cached columnar data I think most of this was in Shark 0.9.1. Also, how likely is the wire format for the columnar compressed data to change? That would be a problem for write-through or persistence. thanks, Evan - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: reference to dstream in package org.apache.spark.streaming which is not available
Yes, master hasn't compiled for me for a few days. It's fixed in: https://github.com/apache/spark/pull/1726 https://github.com/apache/spark/pull/2075 Could a committer sort this out? Sean On Fri, Aug 22, 2014 at 9:55 PM, Ted Yu yuzhih...@gmail.com wrote: Hi, Using the following command on (refreshed) master branch: mvn clean package -DskipTests I got: constituent[36]: file:/homes/hortonzy/apache-maven-3.1.1/conf/logging/ --- java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289) at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229) at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415) at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356) Caused by: scala.reflect.internal.Types$TypeError: bad symbolic reference. A signature in TestSuiteBase.class refers to term dstream in package org.apache.spark.streaming which is not available. It may be completely missing from the current classpath, or the version on the classpath might be incompatible with the version used when compiling TestSuiteBase.class. at scala.reflect.internal.pickling.UnPickler$Scan.toTypeError(UnPickler.scala:847) at scala.reflect.internal.pickling.UnPickler$Scan$LazyTypeRef.complete(UnPickler.scala:854) at scala.reflect.internal.Symbols$Symbol.info(Symbols.scala:1231) at scala.reflect.internal.Types$TypeMap$$anonfun$noChangeToSymbols$1.apply(Types.scala:4280) at scala.reflect.internal.Types$TypeMap$$anonfun$noChangeToSymbols$1.apply(Types.scala:4280) at scala.collection.LinearSeqOptimized$class.forall(LinearSeqOptimized.scala:70) at scala.collection.immutable.List.forall(List.scala:84) at scala.reflect.internal.Types$TypeMap.noChangeToSymbols(Types.scala:4280) at scala.reflect.internal.Types$TypeMap.mapOver(Types.scala:4293) at scala.reflect.internal.Types$TypeMap.mapOver(Types.scala:4196) at scala.reflect.internal.Types$AsSeenFromMap.apply(Types.scala:4638) at scala.reflect.internal.Types$TypeMap.mapOver(Types.scala:4202) at scala.reflect.internal.Types$AsSeenFromMap.apply(Types.scala:4638) at scala.reflect.internal.Types$Type.asSeenFrom(Types.scala:754) at scala.reflect.internal.Types$Type.memberInfo(Types.scala:773) at xsbt.ExtractAPI.defDef(ExtractAPI.scala:224) at xsbt.ExtractAPI.xsbt$ExtractAPI$$definition(ExtractAPI.scala:315) at xsbt.ExtractAPI$$anonfun$xsbt$ExtractAPI$$processDefinitions$1.apply(ExtractAPI.scala:296) at xsbt.ExtractAPI$$anonfun$xsbt$ExtractAPI$$processDefinitions$1.apply(ExtractAPI.scala:296) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251) at scala.collection.mutable.ArrayOps$ofRef.flatMap(ArrayOps.scala:108) at xsbt.ExtractAPI.xsbt$ExtractAPI$$processDefinitions(ExtractAPI.scala:296) at xsbt.ExtractAPI$$anonfun$mkStructure$4.apply(ExtractAPI.scala:293) at xsbt.ExtractAPI$$anonfun$mkStructure$4.apply(ExtractAPI.scala:293) at xsbt.Message$$anon$1.apply(Message.scala:8) at xsbti.SafeLazy$$anonfun$apply$1.apply(SafeLazy.scala:8) at xsbti.SafeLazy$Impl._t$lzycompute(SafeLazy.scala:20) at xsbti.SafeLazy$Impl._t(SafeLazy.scala:18) at xsbti.SafeLazy$Impl.get(SafeLazy.scala:24) at xsbt.ExtractAPI$$anonfun$forceStructures$1.apply(ExtractAPI.scala:138) at xsbt.ExtractAPI$$anonfun$forceStructures$1.apply(ExtractAPI.scala:138) at scala.collection.immutable.List.foreach(List.scala:318) at xsbt.ExtractAPI.forceStructures(ExtractAPI.scala:138) at xsbt.ExtractAPI.forceStructures(ExtractAPI.scala:139) at xsbt.API$ApiPhase.processScalaUnit(API.scala:54) at xsbt.API$ApiPhase.processUnit(API.scala:38) at xsbt.API$ApiPhase$$anonfun$run$1.apply(API.scala:34) at xsbt.API$ApiPhase$$anonfun$run$1.apply(API.scala:34) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at xsbt.API$ApiPhase.run(API.scala:34) at scala.tools.nsc.Global$Run.compileUnitsInternal(Global.scala:1583) at scala.tools.nsc.Global$Run.compileUnits(Global.scala:1557) at scala.tools.nsc.Global$Run.compileSources(Global.scala:1553) at scala.tools.nsc.Global$Run.compile(Global.scala:1662) at xsbt.CachedCompiler0.run(CompilerInterface.scala:123) at
Re: reference to dstream in package org.apache.spark.streaming which is not available
Sean - I think only the ones in 1726 are enough. It is weird that any class that uses the test-jar actually requires the streaming jar to be added explicitly. Shouldn't maven take care of this? I posted some comments on the PR. -- Thanks, Hari Sean Owen mailto:so...@cloudera.com August 22, 2014 at 3:58 PM Yes, master hasn't compiled for me for a few days. It's fixed in: https://github.com/apache/spark/pull/1726 https://github.com/apache/spark/pull/2075 Could a committer sort this out? Sean - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org Ted Yu mailto:yuzhih...@gmail.com August 22, 2014 at 1:55 PM Hi, Using the following command on (refreshed) master branch: mvn clean package -DskipTests I got: constituent[36]: file:/homes/hortonzy/apache-maven-3.1.1/conf/logging/ --- java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289) at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229) at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415) at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356) Caused by: scala.reflect.internal.Types$TypeError: bad symbolic reference. A signature in TestSuiteBase.class refers to term dstream in package org.apache.spark.streaming which is not available. It may be completely missing from the current classpath, or the version on the classpath might be incompatible with the version used when compiling TestSuiteBase.class. at scala.reflect.internal.pickling.UnPickler$Scan.toTypeError(UnPickler.scala:847) at scala.reflect.internal.pickling.UnPickler$Scan$LazyTypeRef.complete(UnPickler.scala:854) at scala.reflect.internal.Symbols$Symbol.info(Symbols.scala:1231) at scala.reflect.internal.Types$TypeMap$$anonfun$noChangeToSymbols$1.apply(Types.scala:4280) at scala.reflect.internal.Types$TypeMap$$anonfun$noChangeToSymbols$1.apply(Types.scala:4280) at scala.collection.LinearSeqOptimized$class.forall(LinearSeqOptimized.scala:70) at scala.collection.immutable.List.forall(List.scala:84) at scala.reflect.internal.Types$TypeMap.noChangeToSymbols(Types.scala:4280) at scala.reflect.internal.Types$TypeMap.mapOver(Types.scala:4293) at scala.reflect.internal.Types$TypeMap.mapOver(Types.scala:4196) at scala.reflect.internal.Types$AsSeenFromMap.apply(Types.scala:4638) at scala.reflect.internal.Types$TypeMap.mapOver(Types.scala:4202) at scala.reflect.internal.Types$AsSeenFromMap.apply(Types.scala:4638) at scala.reflect.internal.Types$Type.asSeenFrom(Types.scala:754) at scala.reflect.internal.Types$Type.memberInfo(Types.scala:773) at xsbt.ExtractAPI.defDef(ExtractAPI.scala:224) at xsbt.ExtractAPI.xsbt$ExtractAPI$$definition(ExtractAPI.scala:315) at xsbt.ExtractAPI$$anonfun$xsbt$ExtractAPI$$processDefinitions$1.apply(ExtractAPI.scala:296) at xsbt.ExtractAPI$$anonfun$xsbt$ExtractAPI$$processDefinitions$1.apply(ExtractAPI.scala:296) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251) at scala.collection.mutable.ArrayOps$ofRef.flatMap(ArrayOps.scala:108) at xsbt.ExtractAPI.xsbt$ExtractAPI$$processDefinitions(ExtractAPI.scala:296) at xsbt.ExtractAPI$$anonfun$mkStructure$4.apply(ExtractAPI.scala:293) at xsbt.ExtractAPI$$anonfun$mkStructure$4.apply(ExtractAPI.scala:293) at xsbt.Message$$anon$1.apply(Message.scala:8) at xsbti.SafeLazy$$anonfun$apply$1.apply(SafeLazy.scala:8) at xsbti.SafeLazy$Impl._t$lzycompute(SafeLazy.scala:20) at xsbti.SafeLazy$Impl._t(SafeLazy.scala:18) at xsbti.SafeLazy$Impl.get(SafeLazy.scala:24) at xsbt.ExtractAPI$$anonfun$forceStructures$1.apply(ExtractAPI.scala:138) at xsbt.ExtractAPI$$anonfun$forceStructures$1.apply(ExtractAPI.scala:138) at scala.collection.immutable.List.foreach(List.scala:318) at xsbt.ExtractAPI.forceStructures(ExtractAPI.scala:138) at xsbt.ExtractAPI.forceStructures(ExtractAPI.scala:139) at xsbt.API$ApiPhase.processScalaUnit(API.scala:54) at xsbt.API$ApiPhase.processUnit(API.scala:38) at xsbt.API$ApiPhase$$anonfun$run$1.apply(API.scala:34) at xsbt.API$ApiPhase$$anonfun$run$1.apply(API.scala:34) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at
Re: Spark Contribution
Great idea. Added the link https://github.com/apache/spark/blob/master/README.md On Thu, Aug 21, 2014 at 4:06 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: We should add this link to the readme on GitHub btw. 2014년 8월 21일 목요일, Henry Saputrahenry.sapu...@gmail.com님이 작성한 메시지: The Apache Spark wiki on how to contribute should be great place to start: https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark - Henry On Thu, Aug 21, 2014 at 3:25 AM, Maisnam Ns maisnam...@gmail.com javascript:; wrote: Hi, Can someone help me with some links on how to contribute for Spark Regards mns - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org javascript:; For additional commands, e-mail: dev-h...@spark.apache.org javascript:;
Graphx seems to be broken while Creating a large graph(6B nodes in my case)
While creating a graph with 6B nodes and 12B edges, I noticed that *'numVertices' api returns incorrect result*; 'numEdges' reports correct number. For few times(with different dataset 2.5B nodes) I have also notices that numVertices is returned as -ive number; so I suspect that there is some overflow (may be we are using Int for some field?). Environment: Standalone mode running on EC2 . Using latest code from master branch upto commit #db56f2df1b8027171da1b8d2571d1f2ef1e103b6 . Here is some details of experiments I have done so far: 1. Input: numNodes=6101995593 ; noEdges=12163784626 Graph returns: numVertices=1807028297 ; numEdges=12163784626 2. Input : numNodes=*2157586441* ; noEdges=2747322705 Graph Returns: numVertices=*-2137380855* ; numEdges=2747322705 3. Input: numNodes=1725060105 ; noEdges=204176821 Graph: numVertices=1725060105 ; numEdges=2041768213 You can find the code to generate this bug here: https://gist.github.com/npanj/92e949d86d08715bf4bf (I have also filed this jira ticket: https://issues.apache.org/jira/browse/SPARK-3190) -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Graphx-seems-to-be-broken-while-Creating-a-large-graph-6B-nodes-in-my-case-tp7966.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: reference to dstream in package org.apache.spark.streaming which is not available
Figured it out. Fixing this ASAP. TD On Fri, Aug 22, 2014 at 5:51 PM, Patrick Wendell pwend...@gmail.com wrote: Hey All, We can sort this out ASAP. Many of the Spark committers were at a company offsite for the last 72 hours, so sorry that it is broken. - Patrick On Fri, Aug 22, 2014 at 4:07 PM, Hari Shreedharan hshreedha...@cloudera.com wrote: Sean - I think only the ones in 1726 are enough. It is weird that any class that uses the test-jar actually requires the streaming jar to be added explicitly. Shouldn't maven take care of this? I posted some comments on the PR. -- Thanks, Hari Sean Owen mailto:so...@cloudera.com August 22, 2014 at 3:58 PM Yes, master hasn't compiled for me for a few days. It's fixed in: https://github.com/apache/spark/pull/1726 https://github.com/apache/spark/pull/2075 Could a committer sort this out? Sean - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org Ted Yu mailto:yuzhih...@gmail.com August 22, 2014 at 1:55 PM Hi, Using the following command on (refreshed) master branch: mvn clean package -DskipTests I got: constituent[36]: file:/homes/hortonzy/apache-maven-3.1.1/conf/logging/ --- java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke( NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke( DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.codehaus.plexus.classworlds.launcher.Launcher. launchEnhanced(Launcher.java:289) at org.codehaus.plexus.classworlds.launcher.Launcher. launch(Launcher.java:229) at org.codehaus.plexus.classworlds.launcher.Launcher. mainWithExitCode(Launcher.java:415) at org.codehaus.plexus.classworlds.launcher.Launcher. main(Launcher.java:356) Caused by: scala.reflect.internal.Types$TypeError: bad symbolic reference. A signature in TestSuiteBase.class refers to term dstream in package org.apache.spark.streaming which is not available. It may be completely missing from the current classpath, or the version on the classpath might be incompatible with the version used when compiling TestSuiteBase.class. at scala.reflect.internal.pickling.UnPickler$Scan. toTypeError(UnPickler.scala:847) at scala.reflect.internal.pickling.UnPickler$Scan$LazyTypeRef.complete( UnPickler.scala:854) at scala.reflect.internal.Symbols$Symbol.info(Symbols.scala:1231) at scala.reflect.internal.Types$TypeMap$$anonfun$noChangeToSymbols$1.apply( Types.scala:4280) at scala.reflect.internal.Types$TypeMap$$anonfun$noChangeToSymbols$1.apply( Types.scala:4280) at scala.collection.LinearSeqOptimized$class.forall(LinearSeqOptimized. scala:70) at scala.collection.immutable.List.forall(List.scala:84) at scala.reflect.internal.Types$TypeMap.noChangeToSymbols( Types.scala:4280) at scala.reflect.internal.Types$TypeMap.mapOver(Types.scala:4293) at scala.reflect.internal.Types$TypeMap.mapOver(Types.scala:4196) at scala.reflect.internal.Types$AsSeenFromMap.apply(Types.scala:4638) at scala.reflect.internal.Types$TypeMap.mapOver(Types.scala:4202) at scala.reflect.internal.Types$AsSeenFromMap.apply(Types.scala:4638) at scala.reflect.internal.Types$Type.asSeenFrom(Types.scala:754) at scala.reflect.internal.Types$Type.memberInfo(Types.scala:773) at xsbt.ExtractAPI.defDef(ExtractAPI.scala:224) at xsbt.ExtractAPI.xsbt$ExtractAPI$$definition(ExtractAPI.scala:315) at xsbt.ExtractAPI$$anonfun$xsbt$ExtractAPI$$processDefinitions$1.apply( ExtractAPI.scala:296) at xsbt.ExtractAPI$$anonfun$xsbt$ExtractAPI$$processDefinitions$1.apply( ExtractAPI.scala:296) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply( TraversableLike.scala:251) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply( TraversableLike.scala:251) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized. scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) at scala.collection.TraversableLike$class.flatMap( TraversableLike.scala:251) at scala.collection.mutable.ArrayOps$ofRef.flatMap(ArrayOps.scala:108) at xsbt.ExtractAPI.xsbt$ExtractAPI$$processDefinitions(ExtractAPI. scala:296) at xsbt.ExtractAPI$$anonfun$mkStructure$4.apply(ExtractAPI.scala:293) at xsbt.ExtractAPI$$anonfun$mkStructure$4.apply(ExtractAPI.scala:293) at xsbt.Message$$anon$1.apply(Message.scala:8) at xsbti.SafeLazy$$anonfun$apply$1.apply(SafeLazy.scala:8) at xsbti.SafeLazy$Impl._t$lzycompute(SafeLazy.scala:20) at xsbti.SafeLazy$Impl._t(SafeLazy.scala:18) at xsbti.SafeLazy$Impl.get(SafeLazy.scala:24) at
Re: reference to dstream in package org.apache.spark.streaming which is not available
The real fix is that the spark sink suite does not really need to use to the spark-streaming test jars. Removing that dependency altogether, and submitting a PR. TD On Fri, Aug 22, 2014 at 6:34 PM, Tathagata Das tathagata.das1...@gmail.com wrote: Figured it out. Fixing this ASAP. TD On Fri, Aug 22, 2014 at 5:51 PM, Patrick Wendell pwend...@gmail.com wrote: Hey All, We can sort this out ASAP. Many of the Spark committers were at a company offsite for the last 72 hours, so sorry that it is broken. - Patrick On Fri, Aug 22, 2014 at 4:07 PM, Hari Shreedharan hshreedha...@cloudera.com wrote: Sean - I think only the ones in 1726 are enough. It is weird that any class that uses the test-jar actually requires the streaming jar to be added explicitly. Shouldn't maven take care of this? I posted some comments on the PR. -- Thanks, Hari Sean Owen mailto:so...@cloudera.com August 22, 2014 at 3:58 PM Yes, master hasn't compiled for me for a few days. It's fixed in: https://github.com/apache/spark/pull/1726 https://github.com/apache/spark/pull/2075 Could a committer sort this out? Sean - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org Ted Yu mailto:yuzhih...@gmail.com August 22, 2014 at 1:55 PM Hi, Using the following command on (refreshed) master branch: mvn clean package -DskipTests I got: constituent[36]: file:/homes/hortonzy/apache-maven-3.1.1/conf/logging/ --- java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke( NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke( DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.codehaus.plexus.classworlds.launcher.Launcher. launchEnhanced(Launcher.java:289) at org.codehaus.plexus.classworlds.launcher.Launcher. launch(Launcher.java:229) at org.codehaus.plexus.classworlds.launcher.Launcher. mainWithExitCode(Launcher.java:415) at org.codehaus.plexus.classworlds.launcher.Launcher. main(Launcher.java:356) Caused by: scala.reflect.internal.Types$TypeError: bad symbolic reference. A signature in TestSuiteBase.class refers to term dstream in package org.apache.spark.streaming which is not available. It may be completely missing from the current classpath, or the version on the classpath might be incompatible with the version used when compiling TestSuiteBase.class. at scala.reflect.internal.pickling.UnPickler$Scan. toTypeError(UnPickler.scala:847) at scala.reflect.internal.pickling.UnPickler$Scan$LazyTypeRef.complete( UnPickler.scala:854) at scala.reflect.internal.Symbols$Symbol.info(Symbols.scala:1231) at scala.reflect.internal.Types$TypeMap$$anonfun$noChangeToSymbols$1.apply( Types.scala:4280) at scala.reflect.internal.Types$TypeMap$$anonfun$noChangeToSymbols$1.apply( Types.scala:4280) at scala.collection.LinearSeqOptimized$class.forall(LinearSeqOptimized. scala:70) at scala.collection.immutable.List.forall(List.scala:84) at scala.reflect.internal.Types$TypeMap.noChangeToSymbols( Types.scala:4280) at scala.reflect.internal.Types$TypeMap.mapOver(Types.scala:4293) at scala.reflect.internal.Types$TypeMap.mapOver(Types.scala:4196) at scala.reflect.internal.Types$AsSeenFromMap.apply(Types.scala:4638) at scala.reflect.internal.Types$TypeMap.mapOver(Types.scala:4202) at scala.reflect.internal.Types$AsSeenFromMap.apply(Types.scala:4638) at scala.reflect.internal.Types$Type.asSeenFrom(Types.scala:754) at scala.reflect.internal.Types$Type.memberInfo(Types.scala:773) at xsbt.ExtractAPI.defDef(ExtractAPI.scala:224) at xsbt.ExtractAPI.xsbt$ExtractAPI$$definition(ExtractAPI.scala:315) at xsbt.ExtractAPI$$anonfun$xsbt$ExtractAPI$$processDefinitions$1.apply( ExtractAPI.scala:296) at xsbt.ExtractAPI$$anonfun$xsbt$ExtractAPI$$processDefinitions$1.apply( ExtractAPI.scala:296) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply( TraversableLike.scala:251) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply( TraversableLike.scala:251) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized. scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) at scala.collection.TraversableLike$class.flatMap( TraversableLike.scala:251) at scala.collection.mutable.ArrayOps$ofRef.flatMap(ArrayOps.scala:108) at xsbt.ExtractAPI.xsbt$ExtractAPI$$processDefinitions(ExtractAPI. scala:296) at xsbt.ExtractAPI$$anonfun$mkStructure$4.apply(ExtractAPI.scala:293) at xsbt.ExtractAPI$$anonfun$mkStructure$4.apply(ExtractAPI.scala:293) at
Re: Spark Contribution
Thanks all, for adding this link . On Sat, Aug 23, 2014 at 5:38 AM, Reynold Xin r...@databricks.com wrote: Great idea. Added the link https://github.com/apache/spark/blob/master/README.md On Thu, Aug 21, 2014 at 4:06 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: We should add this link to the readme on GitHub btw. 2014년 8월 21일 목요일, Henry Saputrahenry.sapu...@gmail.com님이 작성한 메시지: The Apache Spark wiki on how to contribute should be great place to start: https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark - Henry On Thu, Aug 21, 2014 at 3:25 AM, Maisnam Ns maisnam...@gmail.com javascript:; wrote: Hi, Can someone help me with some links on how to contribute for Spark Regards mns - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org javascript:; For additional commands, e-mail: dev-h...@spark.apache.org javascript:;