SparkSQL - Caching RDDs

2015-04-01 Thread Venkat, Ankam
hanks! Regards, Venkat Ankam This communication is the property of CenturyLink and may contain confidential or privileged information. Unauthorized use of this communication is strictly prohibited and may be unlawful. If you have received this communication in error, please immediately notify the s

RE: How to 'Pipe' Binary Data in Apache Spark

2015-01-23 Thread Venkat, Ankam
Spark Committers: Please advise the way forward for this issue. Thanks for your support. Regards, Venkat From: Venkat, Ankam Sent: Thursday, January 22, 2015 9:34 AM To: 'Frank Austin Nothaft'; 'user@spark.apache.org' Cc: 'Nick Allen' Subject: RE: How to 'Pip

RE: How to 'Pipe' Binary Data in Apache Spark

2015-01-22 Thread Venkat, Ankam
How much time it takes to port it? Spark committers: Please let us know your thoughts. Regards, Venkat From: Frank Austin Nothaft [mailto:fnoth...@berkeley.edu] Sent: Thursday, January 22, 2015 9:08 AM To: Venkat, Ankam Cc: Nick Allen; user@spark.apache.org Subject: Re: How to 'Pipe&#x

RE: How to 'Pipe' Binary Data in Apache Spark

2015-01-22 Thread Venkat, Ankam
: What's your take on this? Regards, Venkat Ankam From: Frank Austin Nothaft [mailto:fnoth...@berkeley.edu] Sent: Wednesday, January 21, 2015 12:30 PM To: Venkat, Ankam Cc: Nick Allen; user@spark.apache.org Subject: Re: How to 'Pipe' Binary Data in Apache Spark Hi Venkat/Nick, The

RE: How to 'Pipe' Binary Data in Apache Spark

2015-01-21 Thread Venkat, Ankam
7;/usr/local/bin/sox', '-t' >>> 'wav', '-', '-n', 'stats'])).collect() <-- Does not work. Tried different >>> options. AttributeError: 'function' object has no attribute 'read' Any suggestions? Regards, V

Processing .wav files in PySpark

2015-01-16 Thread Venkat, Ankam
wavfile = sc.textFile('hdfs://xxx:8020/user/ab00855/ext2187854_03_27_2014.wav') wavfile.pipe(subprocess.call(['sox', '-t' 'wav', '-', '-n', 'stats'])) I tried different options like sc.binaryFiles and sc.pickleFile. Any thoughts

RE: MLlib vs Madlib

2014-12-14 Thread Venkat, Ankam
orm large scale text analytics and I can data store on HDFS or on Pivotal Greenplum/Hawq. Regards, Venkat Ankam From: Brian Dolan [mailto:buddha_...@yahoo.com] Sent: Sunday, December 14, 2014 10:02 AM To: Venkat, Ankam Cc: 'user@spark.apache.org' Subject: Re: MLlib vs Madlib MADLib (http:

MLlib vs Madlib

2014-12-14 Thread Venkat, Ankam
Can somebody throw light on MLlib vs Madlib? Which is better for machine learning? and are there any specific use case scenarios MLlib or Madlib will shine in? Regards, Venkat Ankam This communication is the property of CenturyLink and may contain confidential or privileged information

RE: Spark Streaming with Python

2014-11-25 Thread Venkat, Ankam
Any idea how to resolve this? Regards, Venkat From: Venkat, Ankam Sent: Sunday, November 23, 2014 12:05 PM To: 'user@spark.apache.org' Subject: Spark Streaming with Python I am trying to run network_wordcount.py example mentioned at https://github.com/apache/spark/blob/master/example

Python Logistic Regression error

2014-11-23 Thread Venkat, Ankam
Can you please suggest sample data for running the logistic_regression.py? I am trying to use a sample data file at https://github.com/apache/spark/blob/master/data/mllib/sample_linear_regression_data.txt I am running this on CDH5.2 Quickstart VM. [cloudera@quickstart mllib]$ spark-submit logi

Spark Streaming with Python

2014-11-23 Thread Venkat, Ankam
I am trying to run network_wordcount.py example mentioned at https://github.com/apache/spark/blob/master/examples/src/main/python/streaming/network_wordcount.py on CDH5.2 Quickstart VM. Getting below error. Traceback (most recent call last): File "/usr/lib/spark/examples/lib/network_wordcoun