Re: spark1.6.2 ClassNotFoundException: org.apache.parquet.hadoop.ParquetOutputCommitter

2016-07-07 Thread Sun Rui
maybe related to "parquet-provided”? remove "parquet-provided” profile when making the distribution or adding the parquet jar into class path when running Spark > On Jul 8, 2016, at 09:25, kevin wrote: > > parquet-provided

Re: Stopping Spark executors

2016-07-07 Thread Mr rty ff
HiI am sorry but its still not clearDo you mean ./bin/spark-shell --master localAnd what I do after that killing the org.apache.spark.deploy.SparkSubmit --master local --class org.apache.spark.repl.Main --name Spark shell spark-shell will kill the shell so I couldn't send the commands .Thanks

Re: Bad JIRA components

2016-07-07 Thread Nicholas Chammas
Thanks Reynold. On Thu, Jul 7, 2016 at 5:03 PM Reynold Xin wrote: > I deleted those. > > > On Thu, Jul 7, 2016 at 1:27 PM, Nicholas Chammas < > nicholas.cham...@gmail.com> wrote: > >> >>

Re: Bad JIRA components

2016-07-07 Thread Reynold Xin
I deleted those. On Thu, Jul 7, 2016 at 1:27 PM, Nicholas Chammas wrote: > > https://issues.apache.org/jira/browse/SPARK/?selectedTab=com.atlassian.jira.jira-projects-plugin:components-panel > > There are several bad components in there, like docs, MLilb, and sq;.

Re: [DISCUSS] Minimize use of MINOR, BUILD, and HOTFIX w/ no JIRA

2016-07-07 Thread Tom Graves
I think the problems comes in with your definition as well as peoples interpretation of that.  I don't agree with your statement of "where the "how" is different from the "what"".   This could apply to a lot of things.  I could easily file a jira that says remove synchronization on routine x,

Re: Stopping Spark executors

2016-07-07 Thread Mr rty ff
I don't think Its the proper way to recreate the bug becouse I should continue to send commands to the shellThey talking about killing the  CoarseGrainedExecutorBackend On Thursday, July 7, 2016 11:32 PM, Jacek Laskowski wrote: Hi, It appears you're running local mode

Re: Stopping Spark executors

2016-07-07 Thread Jacek Laskowski
Hi, It appears you're running local mode (local[*] assumed) so killing spark-shell *will* kill the one and only executor -- the driver :) Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at

Re: Stopping Spark executors

2016-07-07 Thread Mr rty ff
This what I get when I run the command946 sun.tools.jps.Jps -lm7443 org.apache.spark.deploy.SparkSubmit --class org.apache.spark.repl.Main --name Spark shell spark-shellI don't think that shululd kill SparkSubmit  process On Thursday, July 7, 2016 9:58 PM, Jacek Laskowski

Bad JIRA components

2016-07-07 Thread Nicholas Chammas
https://issues.apache.org/jira/browse/SPARK/?selectedTab=com.atlassian.jira.jira-projects-plugin:components-panel There are several bad components in there, like docs, MLilb, and sq;. I’ve updated the issues that were assigned to them, but I don’t know if there is a way to delete these components

Re: Expanded docs for the various storage levels

2016-07-07 Thread Nicholas Chammas
JIRA is here: https://issues.apache.org/jira/browse/SPARK-16427 On Thu, Jul 7, 2016 at 3:18 PM Reynold Xin wrote: > Please create a patch. Thanks! > > > On Thu, Jul 7, 2016 at 12:07 PM, Nicholas Chammas < > nicholas.cham...@gmail.com> wrote: > >> I’m looking at the docs

Re: [DISCUSS] Minimize use of MINOR, BUILD, and HOTFIX w/ no JIRA

2016-07-07 Thread Sean Owen
I don't agree that every change needs a JIRA, myself. Really, we didn't choose to have this system split across JIRA and Github PRs. It's necessitated by how the ASF works (and with some good reasons). But while we have this dual system, I figure, let's try to make some sense of it. I think it

Re: Anyone knows the hive repo for spark-2.0?

2016-07-07 Thread Michael Allman
FYI if you just want to look at the source code, there are source jars for those binary versions in maven central. I was just looking at the metastore source code last night. Michael > On Jul 7, 2016, at 12:13 PM, Jonathan Kelly wrote: > > I'm not sure, but I think

Re: Expanded docs for the various storage levels

2016-07-07 Thread Reynold Xin
Please create a patch. Thanks! On Thu, Jul 7, 2016 at 12:07 PM, Nicholas Chammas < nicholas.cham...@gmail.com> wrote: > I’m looking at the docs here: > > > http://spark.apache.org/docs/1.6.2/api/python/pyspark.html#pyspark.StorageLevel >

Re: Anyone knows the hive repo for spark-2.0?

2016-07-07 Thread Jonathan Kelly
I'm not sure, but I think it's https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2. It would be really nice though to have this whole process better documented and more "official" than just building from somebody's personal fork of Hive. Or is there some way that the Spark community

Expanded docs for the various storage levels

2016-07-07 Thread Nicholas Chammas
I’m looking at the docs here: http://spark.apache.org/docs/1.6.2/api/python/pyspark.html#pyspark.StorageLevel A newcomer to Spark won’t understand the meaning of _2, or the meaning of _SER (or its value), and

Re: Stopping Spark executors

2016-07-07 Thread Jacek Laskowski
Hi, Use jps -lm and see the processes on the machine(s) to kill. Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Wed, Jul 6, 2016 at 9:49 PM, Mr rty ff

Re: SPARK-8813 - combining small files in spark sql

2016-07-07 Thread Reynold Xin
When using native data sources (e.g. Parquet, ORC, JSON, ...), partitions are automatically merged so they would add up to a specific size, configurable by spark.sql.files.maxPartitionBytes. spark.sql.files.openCostInBytes is used to specify the cost of each "file". That is, an empty file will be

Re: [DISCUSS] Minimize use of MINOR, BUILD, and HOTFIX w/ no JIRA

2016-07-07 Thread Tom Graves
Popping this back up to the dev list again.  I see a bunch of checkins with minor or hotfix.   It seems to me we shouldn't be doing this, but I would like to hear thoughts from others.  I see no reason we can't have a jira for each of those issues, it only takes a few seconds to file one and it

Re: Anyone knows the hive repo for spark-2.0?

2016-07-07 Thread Marcelo Vanzin
(Actually that's "spark" and not "spark2", so yeah, that doesn't really answer the question.) On Thu, Jul 7, 2016 at 11:38 AM, Marcelo Vanzin wrote: > My guess would be https://github.com/pwendell/hive/tree/release-1.2.1-spark > > On Thu, Jul 7, 2016 at 11:37 AM, Zhan Zhang

Re: Anyone knows the hive repo for spark-2.0?

2016-07-07 Thread Marcelo Vanzin
My guess would be https://github.com/pwendell/hive/tree/release-1.2.1-spark On Thu, Jul 7, 2016 at 11:37 AM, Zhan Zhang wrote: > I saw the pom file having hive version as > 1.2.1.spark2. But I cannot find the branch in > https://github.com/pwendell/ > > Does anyone know where

Anyone knows the hive repo for spark-2.0?

2016-07-07 Thread Zhan Zhang
I saw the pom file having hive version as 1.2.1.spark2. But I cannot find the branch in https://github.com/pwendell/ Does anyone know where the repo is? Thanks. Zhan Zhang -- View this message in context:

Why the org.apache.spark.sql.catalyst.expressions.SortArray is with CodegenFallback?

2016-07-07 Thread 楊閔富
I found CollapseCodengenStages.supportCodegen(e: Expression) will determine SortArray expression not CodegenSupported since SortArray is with CodegenFallback. Can I ask why the SortArray is not CodeGenSupoort??

Re: Understanding pyspark data flow on worker nodes

2016-07-07 Thread Amit Rana
As mentioned in the documentation: PythonRDD objects launch Python subprocesses and communicate with them using pipes, sending the user's code and the data to be processed. I am trying to understand the implementation of how this data transfer is happening using pipes. Can anyone please guide

Re: Latest spark release in the 1.4 branch

2016-07-07 Thread Niranda Perera
Hi Mark, I agree. :-) We already have a product released with Spark 1.4.1 with some custom extensions and now we are doing a patch release. We will update Spark to the latest 2.x version in the next release. Best On Thu, Jul 7, 2016 at 1:12 PM, Mark Hamstra wrote: >

Re: Understanding pyspark data flow on worker nodes

2016-07-07 Thread Sun Rui
You can read https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals For pySpark data flow on worker nodes, you can read the source code of PythonRDD.scala. Python worker processes communicate with Spark executors

Understanding pyspark data flow on worker nodes

2016-07-07 Thread Amit Rana
Hi all, I am trying to trace the data flow in pyspark. I am using intellij IDEA in windows 7. I had submitted a python job as follows: --master local[4] I have made the following insights after running the above command in debug mode: ->Locally when a pyspark's interpreter starts, it also

Re: SPARK-8813 - combining small files in spark sql

2016-07-07 Thread Sean Owen
-user Reynold made the comment that he thinks this was resolved by another change; maybe he can comment. On Thu, Jul 7, 2016 at 7:53 AM, Ajay Srivastava wrote: > Hi, > > This jira https://issues.apache.org/jira/browse/SPARK-8813 is fixed in spark > 2.0. > But

Re: Latest spark release in the 1.4 branch

2016-07-07 Thread Mark Hamstra
You've got to satisfy my curiosity, though. Why would you want to run such a badly out-of-date version in production? I mean, 2.0.0 is just about ready for release, and lagging three full releases behind, with one of them being a major version release, is a long way from where Spark is now. On

SparkSQL Added file get Exception: is a directory and recursive is not turned on

2016-07-07 Thread linxi zeng
Hi, all: As recorded in https://issues.apache.org/jira/browse/SPARK-16408, when using Spark-sql to execute sql like: add file hdfs://xxx/user/test; If the HDFS path( hdfs://xxx/user/test) is a directory, then we will get an exception like: org.apache.spark.SparkException: Added file

Re: Latest spark release in the 1.4 branch

2016-07-07 Thread Niranda Perera
Thanks Reynold On Thu, Jul 7, 2016 at 11:40 AM, Reynold Xin wrote: > Yes definitely. > > > On Wed, Jul 6, 2016 at 11:08 PM, Niranda Perera > wrote: > >> Thanks Reynold for the prompt response. Do you think we could use a >> 1.4-branch latest build

Re: Latest spark release in the 1.4 branch

2016-07-07 Thread Reynold Xin
Yes definitely. On Wed, Jul 6, 2016 at 11:08 PM, Niranda Perera wrote: > Thanks Reynold for the prompt response. Do you think we could use a > 1.4-branch latest build in a production environment? > > > > On Thu, Jul 7, 2016 at 11:33 AM, Reynold Xin

Re: Latest spark release in the 1.4 branch

2016-07-07 Thread Niranda Perera
Thanks Reynold for the prompt response. Do you think we could use a 1.4-branch latest build in a production environment? On Thu, Jul 7, 2016 at 11:33 AM, Reynold Xin wrote: > I think last time I tried I had some trouble releasing it because the > release scripts no longer

Re: Latest spark release in the 1.4 branch

2016-07-07 Thread Reynold Xin
I think last time I tried I had some trouble releasing it because the release scripts no longer work with branch-1.4. You can build from the branch yourself, but it might be better to upgrade to the later versions. On Wed, Jul 6, 2016 at 11:02 PM, Niranda Perera wrote:

Latest spark release in the 1.4 branch

2016-07-07 Thread Niranda Perera
Hi guys, May I know if you have halted development in the Spark 1.4 branch? I see that there is a release tag for 1.4.2 but it was never released. Can we expect a 1.4.x bug fixing release anytime soon? Best -- Niranda @n1r44 +94-71-554-8430