Spark 1.2.2 build problem with Hive 0.12, bringing in wrong version of avro-mapred

2015-08-12 Thread java8964
Hi, This email is sent to both dev and user list, just want to see if someone familiar with Spark/Maven build procedure can provide any help. I am building Spark 1.2.2 with the following command: mvn -Phadoop-2.2 -Dhadoop.version=2.2.0 -Phive -Phive-0.12.0 The spark-assembly-1.2.2-hadoop2.2.0.jar

Re: PySpark on PyPi

2015-08-12 Thread quasiben
I've help to build a conda installable spark packages in the past. You can an older recipe here: https://github.com/conda/conda-recipes/tree/master/spark And I've been updating packages here: https://anaconda.org/anaconda-cluster/spark `conda install -c anaconda-cluster spark` The above

Intermittent timeout failure org/apache/spark/sql/hive/thriftserver/CliSuite.scala

2015-08-12 Thread Tim Preece
I was just debugging an intermittent timeout failure in the testsuite CliSuite.scala I traced it down to a timing window in the Scala library class sys.process.ProcessImpl.scala. Sometimes the input pipe to a process becomes None before the process has had a chance to read any input at all.

Re: possible issues with listing objects in the HadoopFSrelation

2015-08-12 Thread Cheng Lian
Hi Gil, Sorry for the late reply and thanks for raising this question. The file listing logic in HadoopFsRelation is intentionally made different from Hadoop FileInputFormat. Here are the reasons: 1. Efficiency: when computing RDD partitions, FileInputFormat.listStatus() is called on the

Re: possible issues with listing objects in the HadoopFSrelation

2015-08-12 Thread Gil Vernik
Hi Cheng, Thanks a lot for responding to it. I still miss some points in the Efficiency and i would be very thankful if you will expand it little bit more. As i see it, both HadoopFSRelation and FileInputFormat.listStatus perform lists and eventually both calls to FileSystem.listStatus method.

Does Spark optimization might miss to run transformation?

2015-08-12 Thread Eugene Morozov
Hi! I’d like to complete action (store / print smth) inside of transformation (map or mapPartitions). This approach has some flaws, but there is a question. Might it happen that Spark will optimise (RDD or DataFrame) processing so that my mapPartitions simply won’t happen? -- Eugene Morozov

Re: Spark runs into an Infinite loop even if the tasks are completed successfully

2015-08-12 Thread Imran Rashid
yikes. Was this a one-time thing? Or does it happen consistently? can you turn on debug logging for o.a.s.scheduler (dunno if it will help, but maybe ...) On Tue, Aug 11, 2015 at 8:59 AM, Akhil Das ak...@sigmoidanalytics.com wrote: Hi My Spark job (running in local[*] with spark 1.4.1)

Re: Intermittent timeout failure org/apache/spark/sql/hive/thriftserver/CliSuite.scala

2015-08-12 Thread Reynold Xin
Thanks for finding this. Should we just switch to Java's process library for now? On Wed, Aug 12, 2015 at 1:30 AM, Tim Preece tepre...@mail.com wrote: I was just debugging an intermittent timeout failure in the testsuite CliSuite.scala I traced it down to a timing window in the Scala