Re: Spark runs into an Infinite loop even if the tasks are completed successfully

2015-08-14 Thread Akhil Das
is to show overlapping partitions, duplicates. index to partition mismatch - that sort of thing. On Thu, Aug 13, 2015 at 11:42 PM, Akhil Das ak...@sigmoidanalytics.com wrote: Yep, and it works fine for operations which does not involve any shuffle (like foreach,, count etc) and those which

Re: Spark runs into an Infinite loop even if the tasks are completed successfully

2015-08-14 Thread Akhil Das
is to show overlapping partitions, duplicates. index to partition mismatch - that sort of thing. On Thu, Aug 13, 2015 at 11:42 PM, Akhil Das ak...@sigmoidanalytics.com wrote: Yep, and it works fine for operations which does not involve any shuffle (like foreach,, count etc) and those which

Re: saveToCassandra not working in Spark Job but works in Spark Shell

2015-08-14 Thread Akhil Das
Looks like a jar version conflict to me. Thanks Best Regards On Thu, Aug 13, 2015 at 7:59 PM, satish chandra j jsatishchan...@gmail.com wrote: HI, Please let me know if I am missing anything in the below mail, to get the issue fixed Regards, Satish Chandra On Wed, Aug 12, 2015 at 6:59

Re: Spark job endup with NPE

2015-08-14 Thread Akhil Das
You can put a try..catch around all the transformations that you are doing and catch such exceptions instead of crashing your entire job. Thanks Best Regards On Fri, Aug 14, 2015 at 4:35 PM, hide x22t33...@gmail.com wrote: Hello, I'm using spark on yarn cluster and using

Re: Spark runs into an Infinite loop even if the tasks are completed successfully

2015-08-13 Thread Akhil Das
12, 2015 at 10:57 PM, Imran Rashid iras...@cloudera.com wrote: yikes. Was this a one-time thing? Or does it happen consistently? can you turn on debug logging for o.a.s.scheduler (dunno if it will help, but maybe ...) On Tue, Aug 11, 2015 at 8:59 AM, Akhil Das ak...@sigmoidanalytics.com

Re: Error writing to cassandra table using spark application

2015-08-13 Thread Akhil Das
Can you look in the worker logs and see what going wrong? Thanks Best Regards On Wed, Aug 12, 2015 at 9:53 PM, Nupur Kumar (BLOOMBERG/ 731 LEX) nkumar...@bloomberg.net wrote: Hello, I am doing this for the first time so feel free to let me know/forward this to where it needs to be if not

Re: ClassNotFound spark streaming

2015-08-13 Thread Akhil Das
You need to add that jar in the classpath. While submitting the job, you can use --jars, --driver-classpath etc configurations to add the jar. Apart from that if you are running the job as a standalone application, then you can use the sc.addJar option to add the jar (which will ship this jar into

Re: Switch from Sort based to Hash based shuffle

2015-08-13 Thread Akhil Das
Have a look at spark.shuffle.manager, You can switch between sort and hash with this configuration. spark.shuffle.managersortImplementation to use for shuffling data. There are two implementations available:sort and hash. Sort-based shuffle is more memory-efficient and is the default option

Re: Spark Streaming dealing with broken files without dying

2015-08-11 Thread Akhil Das
You can do something like this: val fStream = ssc.textFileStream(/sigmoid/data/) .map(x = { try{ //Move all the transformations within a try..catch }catch{ case e: Exception = { logError(Whoops!! ); null } } }) Thanks Best Regards On Mon, Aug 10, 2015 at 7:44 PM, Mario Pastorelli

Re: Differents of loading data

2015-08-11 Thread Akhil Das
Load data to where? To spark? If you are referring to spark, then there are some differences the way the connector is implemented. When you use spark, the most important thing that you get is the parallelism (depending on the number of partitions). If you compare it with a native java driver then

Re: Java Streaming Context - File Stream use

2015-08-11 Thread Akhil Das
Like this: (Including the filter function) JavaPairInputDStreamLongWritable, Text inputStream = ssc.fileStream( testDir.toString(), LongWritable.class, Text.class, TextInputFormat.class, new FunctionPath, Boolean() { @Override public Boolean call(Path

Re: Pushing Spark to 10Gb/s

2015-08-11 Thread Akhil Das
Hi Starch, It also depends on the applications behavior, some might not be properly able to utilize the network. If you are using say Kafka, then one thing that you should keep in mind is the Size of the individual message and the number of partitions that you are having. The higher the message

Re: Inquery about contributing codes

2015-08-11 Thread Akhil Das
You can create a new Issue and send a pull request for the same i think. + dev list Thanks Best Regards On Tue, Aug 11, 2015 at 8:32 AM, Hyukjin Kwon gurwls...@gmail.com wrote: Dear Sir / Madam, I have a plan to contribute some codes about passing filters to a datasource as physical

Re: Spark with GCS Connector - Rate limit error

2015-08-11 Thread Akhil Das
There's a daily quota and a minutely quota, you could be hitting those. You can ask google to increase the quota for the particular service. Now, to reduce the limit from the spark side, you can actually to a re-partition to a smaller number before doing the save. Another way to use the local file

Re: Inquery about contributing codes

2015-08-11 Thread Akhil Das
You can create a new Issue and send a pull request for the same i think. + dev list Thanks Best Regards On Tue, Aug 11, 2015 at 8:32 AM, Hyukjin Kwon gurwls...@gmail.com wrote: Dear Sir / Madam, I have a plan to contribute some codes about passing filters to a datasource as physical

Spark runs into an Infinite loop even if the tasks are completed successfully

2015-08-11 Thread Akhil Das
Hi My Spark job (running in local[*] with spark 1.4.1) reads data from a thrift server(Created an RDD, it will compute the partitions in getPartitions() call and in computes hasNext will return records from these partitions), count(), foreach() is working fine it returns the correct number of

Spark runs into an Infinite loop even if the tasks are completed successfully

2015-08-11 Thread Akhil Das
Hi My Spark job (running in local[*] with spark 1.4.1) reads data from a thrift server(Created an RDD, it will compute the partitions in getPartitions() call and in computes hasNext will return records from these partitions), count(), foreach() is working fine it returns the correct number of

Re: Possible issue for Spark SQL/DataFrame

2015-08-10 Thread Akhil Das
Isnt it a space separated data? It is not a comma(,) separated nor pipe (|) separated data. Thanks Best Regards On Mon, Aug 10, 2015 at 12:06 PM, Netwaver wanglong_...@163.com wrote: Hi Spark experts, I am now using Spark 1.4.1 and trying Spark SQL/DataFrame API with text

Re: Spark-submit not finding main class and the error reflects different path to jar file than specified

2015-08-09 Thread Akhil Das
Are you setting SPARK_PREPEND_CLASSES? try to disable it. Here your uber jar which does not have the SparkConf is put in the first place of the class-path which is messing it up. Thanks Best Regards On Thu, Aug 6, 2015 at 5:48 PM, Stephen Boesch java...@gmail.com wrote: Given the following

Re: Multiple Thrift servers on one Spark cluster

2015-08-09 Thread Akhil Das
Did you try this way? export HIVE_SERVER2_THRIFT_PORT=6066 ./sbin/start-thriftserver.sh --master master-uri export HIVE_SERVER2_THRIFT_PORT=6067 ./sbin/start-thriftserver.sh --master master-uri You just have to change HIVE_SERVER2_THRIFT_PORT to instantiate multiple servers i think. Thanks

Re: SparkException: Yarn application has already ended

2015-08-09 Thread Akhil Das
Just make sure your hadoop instances are functioning properly, (check for ResourceManager, NodeManager). How are you submitting the job? If that is getting submitted then you can look further in the yarn logs to see whats really going on. Thanks Best Regards On Thu, Aug 6, 2015 at 6:59 PM, Clint

Re: Temp file missing when training logistic regression

2015-08-09 Thread Akhil Das
Which version of spark are you using? Looks like you are hitting the file handles. In that case you might want to increase the ulimit. You can actually validate this by looking in the worker logs (which would probably say Too many open files exception). Thanks Best Regards On Thu, Aug 6, 2015 at

Re: Spark Job Failed (Executor Lost then FS closed)

2015-08-09 Thread Akhil Das
Can you look more in the worker logs and see whats going on? It looks like a memory issue (kind of GC overhead etc., You need to look in the worker logs) Thanks Best Regards On Fri, Aug 7, 2015 at 3:21 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote: Re attaching the images. On Thu, Aug 6, 2015

Re: Spark-submit fails when jar is in HDFS

2015-08-09 Thread Akhil Das
Did you try this way? /usr/local/spark/bin/spark-submit --master mesos://mesos.master:5050 --conf spark.mesos.executor.docker.image=docker.repo/spark:latest --class org.apache.spark.examples.SparkPi *--jars hdfs://hdfs1/tmp/spark-* *examples-1.4.1-hadoop2.6.0-**cdh5.4.4.jar* 100 Thanks Best

Re: Out of memory with twitter spark streaming

2015-08-09 Thread Akhil Das
I'm not sure what you are upto, but if you can explain what you are trying to achieve then may be we can restructure your code. On a quick glance i could see : tweetsRDD*.collect()*.map(tweet= DBQuery.saveTweets(tweet)) Which will bring the whole data into your driver machine and it would

Re: Accessing S3 files with s3n://

2015-08-09 Thread Akhil Das
Depends on which operation you are doing, If you are doing a .count() on a parquet, it might not download the entire file i think, but if you do a .count() on a normal text file it might pull the entire file. Thanks Best Regards On Sat, Aug 8, 2015 at 3:12 AM, Akshat Aranya aara...@gmail.com

Re: using Spark or pig group by efficient in my use case?

2015-08-09 Thread Akhil Das
Why not give it a shot? Spark always outruns old mapreduce jobs. Thanks Best Regards On Sat, Aug 8, 2015 at 8:25 AM, linlma lin...@gmail.com wrote: I have a tens of million records, which is customer ID and city ID pair. There are tens of millions of unique customer ID, and only a few

Re: How to run start-thrift-server in debug mode?

2015-08-09 Thread Akhil Das
It seems, it is not able to pick up the debug parameters. You can actually set export _JAVA_OPTIONS=-agentlib:jdwp=transport=dt_socket,address=8000,server=y,suspend=y and then submit the job to enable debugging. Thanks Best Regards On Fri, Aug 7, 2015 at 10:20 PM, Benjamin Ross

Re: Multiple UpdateStateByKey Functions in the same job?

2015-08-06 Thread Akhil Das
I think you can. Give it a try and see. Thanks Best Regards On Tue, Aug 4, 2015 at 7:02 AM, swetha swethakasire...@gmail.com wrote: Hi, Can I use multiple UpdateStateByKey Functions in the Streaming job? Suppose I need to maintain the state of the user session in the form of a Json and

Re: NoSuchMethodError : org.apache.spark.streaming.scheduler.StreamingListenerBus.start()V

2015-08-06 Thread Akhil Das
For some reason you are having two different versions of spark jars in your classpath. Thanks Best Regards On Tue, Aug 4, 2015 at 12:37 PM, Deepesh Maheshwari deepesh.maheshwar...@gmail.com wrote: Hi, I am trying to read data from kafka and process it using spark. i have attached my source

Re: Twitter Connector-Spark Streaming

2015-08-06 Thread Akhil Das
...@platalytics.com wrote: thanks alot On Tue, Aug 4, 2015 at 2:00 PM, Akhil Das ak...@sigmoidanalytics.com wrote: You will have to write your own consumer for pulling your custom feeds, and then you can do a union (customfeedDstream.union(twitterStream)) with the twitter stream api. Thanks Best

Re: subscribe

2015-08-06 Thread Akhil Das
Welcome aboard! Thanks Best Regards On Thu, Aug 6, 2015 at 11:21 AM, Franc Carter franc.car...@rozettatech.com wrote: subscribe

Re: No Twitter Input from Kafka to Spark Streaming

2015-08-06 Thread Akhil Das
You just pasted your twitter credentials, consider changing it. :/ Thanks Best Regards On Wed, Aug 5, 2015 at 10:07 PM, narendra narencs...@gmail.com wrote: Thanks Akash for the answer. I added endpoint to the listener and now it is working. -- View this message in context:

Re: How do I Process Streams that span multiple lines?

2015-08-04 Thread Akhil Das
If you are using Kafka, then you can basically push an entire file as a message to Kafka. In that case in your DStream, you will receive the single message which is the contents of the file and it can of course span multiple lines. Thanks Best Regards On Mon, Aug 3, 2015 at 8:27 PM, Spark

Re: How to help for 1.5 release?

2015-08-04 Thread Akhil Das
I think you can start from here https://issues.apache.org/jira/browse/SPARK/fixforversion/12332078/?selectedTab=com.atlassian.jira.jira-projects-plugin:version-summary-panel Thanks Best Regards On Tue, Aug 4, 2015 at 12:02 PM, Meihua Wu rotationsymmetr...@gmail.com wrote: I think the team is

Re: Running multiple batch jobs in parallel using Spark on Mesos

2015-08-04 Thread Akhil Das
One approach would be to use a Jobserver in between, create SparkContexts in it. Lets say you create two, one which is configured to run on coarse-grained and another set to fine-grained. Let the high priority jobs hit the coarse-grained SparkContext and the other jobs use the fine-grained one.

Re: Writing to HDFS

2015-08-04 Thread Akhil Das
Just to add rdd.take(1) won't trigger the entire computation, it will just pull out the first record. You need to do a rdd.count() or rdd.saveAs*Files to trigger the complete pipeline. How many partitions do you see in the last stage? Thanks Best Regards On Tue, Aug 4, 2015 at 7:10 AM, ayan guha

Re: Twitter Connector-Spark Streaming

2015-08-04 Thread Akhil Das
that i want to ask is that i have used twitters streaming api.and it seems that the above solution uses rest api. how can i used both simultaneously ? Any response will be much appreciated :) Regards On Tue, Aug 4, 2015 at 1:51 PM, Akhil Das ak...@sigmoidanalytics.com wrote: Yes you can

Re: spark cluster setup

2015-08-03 Thread Akhil Das
Are you sitting behind a firewall and accessing a remote master machine? In that case, have a look at this http://spark.apache.org/docs/latest/configuration.html#networking, you might want to fix few properties like spark.driver.host, spark.driver.host etc. Thanks Best Regards On Mon, Aug 3,

Re: Encryption on RDDs or in-memory/cache on Apache Spark

2015-08-02 Thread Akhil Das
Currently RDDs are not encrypted, I think you can go ahead and open a JIRA to add this feature and may be in future release it could be added. Thanks Best Regards On Fri, Jul 31, 2015 at 1:47 PM, Matthew O'Reilly moreill...@qub.ac.uk wrote: Hi, I am currently working on the latest version of

Re: Does Spark Streaming need to list all the files in a directory?

2015-08-02 Thread Akhil Das
I guess it goes through that 500k files https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala#L193for the first time and then use a filter from next time. Thanks Best Regards On Fri, Jul 31, 2015 at 4:39 AM, Tathagata Das

Re: unsubscribe

2015-08-02 Thread Akhil Das
​LOL Brandon! @ziqiu See http://spark.apache.org/community.html You need to send an email to user-unsubscr...@spark.apache.org​ Thanks Best Regards On Fri, Jul 31, 2015 at 2:06 AM, Brandon White bwwintheho...@gmail.com wrote: https://www.youtube.com/watch?v=JncgoPKklVE On Thu, Jul 30, 2015

Re: Twitter Connector-Spark Streaming

2015-07-30 Thread Akhil Das
specific to my account? Thanks in anticipation :) On Thu, Jul 30, 2015 at 6:17 PM, Akhil Das ak...@sigmoidanalytics.com wrote: Owh, this one fetches the public tweets, not the one specific to your account. Thanks Best Regards On Thu, Jul 30, 2015 at 6:11 PM, Sadaf Khan sa

Re: Spark SQL Error

2015-07-30 Thread Akhil Das
It seem an issue with the ES connector https://github.com/elastic/elasticsearch-hadoop/issues/482 Thanks Best Regards On Tue, Jul 28, 2015 at 6:14 AM, An Tran tra...@gmail.com wrote: Hello all, I am currently having an error with Spark SQL access Elasticsearch using Elasticsearch Spark

Re: streaming issue

2015-07-30 Thread Akhil Das
What operation are you doing with streaming? Also can you look in the datanode logs and see whats going on? Thanks Best Regards On Tue, Jul 28, 2015 at 8:18 AM, guoqing0...@yahoo.com.hk guoqing0...@yahoo.com.hk wrote: Hi, I got a error when running spark streaming as below .

Re: Spark and Speech Recognition

2015-07-30 Thread Akhil Das
Like this? val data = sc.textFile(/sigmoid/audio/data/, 24).foreachPartition(urls = speachRecognizer(urls)) Let 24 be the total number of cores that you have on all the workers. Thanks Best Regards On Wed, Jul 29, 2015 at 6:50 AM, Peter Wolf opus...@gmail.com wrote: Hello, I am writing a

Re: Spark build/sbt assembly

2015-07-30 Thread Akhil Das
Did you try removing this jar? build/sbt-launch-0.13.7.jar Thanks Best Regards On Tue, Jul 28, 2015 at 12:08 AM, Rahul Palamuttam rahulpala...@gmail.com wrote: Hi All, I hope this is the right place to post troubleshooting questions. I've been following the install instructions and I get

Re: Heatmap with Spark Streaming

2015-07-30 Thread Akhil Das
You can easily push data to an intermediate storage from spark streaming (like HBase or a SQL/NoSQL DB etc) and then power your dashboards with d3 js. Thanks Best Regards On Tue, Jul 28, 2015 at 12:18 PM, UMESH CHAUDHARY umesh9...@gmail.com wrote: I have just started using Spark Streaming and

Re: sc.parallelize(512k items) doesn't always use 64 executors

2015-07-30 Thread Akhil Das
sc.parallelize takes a second parameter which is the total number of partitions, are you using that? Thanks Best Regards On Wed, Jul 29, 2015 at 9:27 PM, Kostas Kougios kostas.koug...@googlemail.com wrote: Hi, I do an sc.parallelize with a list of 512k items. But sometimes not all executors

Re: Heatmap with Spark Streaming

2015-07-30 Thread Akhil Das
at 1:23 PM, Akhil Das ak...@sigmoidanalytics.com wrote: You can easily push data to an intermediate storage from spark streaming (like HBase or a SQL/NoSQL DB etc) and then power your dashboards with d3 js. Thanks Best Regards On Tue, Jul 28, 2015 at 12:18 PM, UMESH CHAUDHARY umesh9

Re: java.lang.ArrayIndexOutOfBoundsException: 0 on Yarn Client

2015-07-28 Thread Akhil Das
) at java.lang.Thread.run(Thread.java:745) *From:* Akhil Das [mailto:ak...@sigmoidanalytics.com] *Sent:* Tuesday, July 28, 2015 2:30 PM *To:* Manohar Reddy *Cc:* user@spark.apache.org *Subject:* Re: java.lang.ArrayIndexOutOfBoundsException: 0 on Yarn Client You need to trigger an action on your

Re: java.lang.ArrayIndexOutOfBoundsException: 0 on Yarn Client

2015-07-28 Thread Akhil Das
Put a try catch inside your code and inside the catch print out the length or the list itself which causes the ArrayIndexOutOfBounds. It might happen that some of your data is not proper. Thanks Best Regards On Mon, Jul 27, 2015 at 8:24 PM, Manohar753 manohar.re...@happiestminds.com wrote: Hi

Re: ReceiverStream SPARK not able to cope up with 20,000 events /sec .

2015-07-28 Thread Akhil Das
You need to find the bottleneck here, it could your network (if the data is huge) or your producer code isn't pushing at 20k/s, If you are able to produce at 20k/s then make sure you are able to receive at that rate (try it without spark). Thanks Best Regards On Sat, Jul 25, 2015 at 3:29 PM,

Re: ReceiverStream SPARK not able to cope up with 20,000 events /sec .

2015-07-28 Thread Akhil Das
You need to find the bottleneck here, it could your network (if the data is huge) or your producer code isn't pushing at 20k/s, If you are able to produce at 20k/s then make sure you are able to receive at that rate (try it without spark). Thanks Best Regards On Sat, Jul 25, 2015 at 3:29 PM,

Re: use S3-Compatible Storage with spark

2015-07-28 Thread Akhil Das
With s3n try this out: *s3service.s3-endpoint*The host name of the S3 service. You should only ever change this value from the default if you need to contact an alternative S3 endpoint for testing purposes. Default: s3.amazonaws.com Thanks Best Regards On Tue, Jul 28, 2015 at 1:54 PM, Schmirr

Re: Question abt serialization

2015-07-28 Thread Akhil Das
Did you try it with just: (comment out line 27) println Count of spark: + file.filter({s - s.contains('spark')}).count() Thanks Best Regards On Sun, Jul 26, 2015 at 12:43 AM, tog guillaume.all...@gmail.com wrote: Hi I have been using Spark for quite some time using either scala or python.

Re: Multiple operations on same DStream in Spark Streaming

2015-07-28 Thread Akhil Das
One approach would be to store the batch data in an intermediate storage (like HBase/MySQL or even in zookeeper), and inside your filter function you just go and read the previous value from this storage and do whatever operation that you are supposed to do. Thanks Best Regards On Sun, Jul 26,

Re: Getting java.net.BindException when attempting to start Spark master on EC2 node with public IP

2015-07-28 Thread Akhil Das
Did you try binding to 0.0.0.0? Thanks Best Regards On Mon, Jul 27, 2015 at 10:37 PM, Wayne Song wayne.e.s...@gmail.com wrote: Hello, I am trying to start a Spark master for a standalone cluster on an EC2 node. The CLI command I'm using looks like this: Note that I'm specifying the

Re: java.lang.ArrayIndexOutOfBoundsException: 0 on Yarn Client

2015-07-28 Thread Akhil Das
); } *From:* Akhil Das [mailto:ak...@sigmoidanalytics.com] *Sent:* Tuesday, July 28, 2015 1:52 PM *To:* Manohar Reddy *Cc:* user@spark.apache.org *Subject:* Re: java.lang.ArrayIndexOutOfBoundsException: 0 on Yarn Client Put a try catch inside your code and inside the catch print

Re: use S3-Compatible Storage with spark

2015-07-27 Thread Akhil Das
) 2015-07-27 11:17 GMT+02:00 Akhil Das ak...@sigmoidanalytics.com: So you are able to access your AWS S3 with s3a now? What is the error that you are getting when you try to access the custom storage with fs.s3a.endpoint? Thanks Best Regards On Mon, Jul 27, 2015 at 2:44 PM, Schmirr Wurst

Re: suggest coding platform

2015-07-27 Thread Akhil Das
How about IntelliJ? It also has a Terminal tab. Thanks Best Regards On Fri, Jul 24, 2015 at 6:06 PM, saif.a.ell...@wellsfargo.com wrote: Hi all, I tried Notebook Incubator Zeppelin, but I am not completely happy with it. What do you people use for coding? Anything with auto-complete,

Re: Encryption on RDDs or in-memory on Apache Spark

2015-07-27 Thread Akhil Das
Have a look at the current security support https://spark.apache.org/docs/latest/security.html, Spark does not have any encryption support for objects in memory out of the box. But if your concern is to protect the data being cached in memory, then you can easily encrypt your objects in memory

Re: Spark - Eclipse IDE - Maven

2015-07-27 Thread Akhil Das
You can follow this doc https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools#UsefulDeveloperTools-IDESetup Thanks Best Regards On Fri, Jul 24, 2015 at 10:56 AM, Siva Reddy ksiv...@gmail.com wrote: Hi All, I am trying to setup the Eclipse (LUNA) with Maven so that I

Re: ERROR TaskResultGetter: Exception while getting task result when reading avro files that contain arrays

2015-07-27 Thread Akhil Das
Its a serialization error with nested schema i guess. You can look at the twitters chill avro serializer library. Here's two discussion on the same: - https://issues.apache.org/jira/browse/SPARK-3447 -

Re: java.lang.NoSuchMethodError for list.toMap.

2015-07-27 Thread Akhil Das
Whats in your build.sbt? You could be messing with the scala version it seems. Thanks Best Regards On Fri, Jul 24, 2015 at 2:15 AM, Dan Dong dongda...@gmail.com wrote: Hi, When I ran with spark-submit the following simple Spark program of: import org.apache.spark.SparkContext._ import

Re: spark dataframe gc

2015-07-27 Thread Akhil Das
This spark.shuffle.sort.bypassMergeThreshold might help, You could also try setting the shuffle manager to hash from sort. You can see more configuration options from here https://spark.apache.org/docs/latest/configuration.html#shuffle-behavior. Thanks Best Regards On Fri, Jul 24, 2015 at 3:33

Re: ERROR SparkUI: Failed to bind SparkUI java.net.BindException: Address already in use: Service 'SparkUI' failed after 16 retries!

2015-07-27 Thread Akhil Das
For each of your job, you can pass spark.ui.port to bind to a different port. Thanks Best Regards On Fri, Jul 24, 2015 at 7:49 PM, Joji John jj...@ebates.com wrote: Thanks Ajay. The way we wrote our spark application is that we have a generic python code, multiple instances of which can

Re: use S3-Compatible Storage with spark

2015-07-27 Thread Akhil Das
? 2015-07-20 18:11 GMT+02:00 Schmirr Wurst schmirrwu...@gmail.com: Thanks, that is what I was looking for... Any Idea where I have to store and reference the corresponding hadoop-aws-2.6.0.jar ?: java.io.IOException: No FileSystem for scheme: s3n 2015-07-20 8:33 GMT+02:00 Akhil Das

Re: Writing binary files in Spark

2015-07-25 Thread Akhil Das
alternative from Python? And also, I want to write the raw bytes of my object into files on disk, and not using some Serialization format to be read back into Spark. Is it possible? Any alternatives for that? Thanks, Oren On Thu, Jul 23, 2015 at 8:04 PM Akhil Das ak...@sigmoidanalytics.com

Re: How to restart Twitter spark stream

2015-07-24 Thread Akhil Das
, but at the end I will have only hashtags without statuses. Is that correct, or I missed something? Thanks, Zoran On Wed, Jul 22, 2015 at 12:41 AM, Akhil Das ak...@sigmoidanalytics.com wrote: That was a pseudo code, working version would look like this: val stream

Re: Using Dataframe write with newHdoopApi

2015-07-24 Thread Akhil Das
PM, ayan guha guha.a...@gmail.com wrote: Hi Akhil Thanks.Will definitely take a look. Couple of questions 1. Is it possible to use newHadoopAPI from dataframe.write or saveAs? 2. is esDF usable rom Python? On Fri, Jul 24, 2015 at 2:29 AM, Akhil Das ak...@sigmoidanalytics.com wrote: Did

Re: What if request cores are not satisfied

2015-07-24 Thread Akhil Das
I guess it would wait for sometime and throw up something like this: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory Thanks Best Regards On Thu, Jul 23, 2015 at 7:53 AM, bit1...@163.com bit1...@163.com wrote:

Re: writing/reading multiple Parquet files: Failed to merge incompatible data types StringType and StructType

2015-07-23 Thread Akhil Das
Currently, the only way for you would be to create proper schema for the data. This is not a bug, but you could open a jira (since this would help others to solve their similar use-cases) for feature and in future version it could be implemented and included. Thanks Best Regards On Tue, Jul 21,

Re: spark thrift server supports timeout?

2015-07-23 Thread Akhil Das
Here's a few more configurations https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2#SettingUpHiveServer2-ConfigurationPropertiesinthehive-site.xmlFile can't find anything on the timeouts though. Thanks Best Regards On Wed, Jul 22, 2015 at 1:01 AM, Judy Nash

Re: 1.4.0 classpath issue with spark-submit

2015-07-23 Thread Akhil Das
You can try adding that jar in SPARK_CLASSPATH (its deprecated though) in spark-env.sh file. Thanks Best Regards On Tue, Jul 21, 2015 at 7:34 PM, Michal Haris michal.ha...@visualdna.com wrote: I have a spark program that uses dataframes to query hive and I run it both as a spark-shell for

Re: NullPointerException inside RDD when calling sc.textFile

2015-07-23 Thread Akhil Das
Did you try: val data = indexed_files.groupByKey val *modified_data* = data.map { a = var name = a._2.mkString(,) (a._1, name) } *modified_data*.foreach { a = var file = sc.textFile(a._2) println(file.count) } Thanks Best Regards On Wed, Jul 22, 2015 at 2:18 AM, MorEru

Re: problems running Spark on a firewalled remote YARN cluster via SOCKS proxy

2015-07-23 Thread Akhil Das
It looks like its picking up the wrong namenode uri from the HADOOP_CONF_DIR, make sure it is proper. Also for submitting a spark job to a remote cluster, you might want to look at the spark.driver host and spark.driver.port Thanks Best Regards On Wed, Jul 22, 2015 at 8:56 PM, rok

Re: Using Dataframe write with newHdoopApi

2015-07-23 Thread Akhil Das
Did you happened to look into esDF https://github.com/elastic/elasticsearch-hadoop/issues/441? You can open an issue over here if that doesn't solve your problem https://github.com/elastic/elasticsearch-hadoop/issues Thanks Best Regards On Tue, Jul 21, 2015 at 5:33 PM, ayan guha

Re: Writing binary files in Spark

2015-07-23 Thread Akhil Das
You can look into .saveAsObjectFiles Thanks Best Regards On Thu, Jul 23, 2015 at 8:44 PM, Oren Shpigel o...@yowza3d.com wrote: Hi, I use Spark to read binary files using SparkContext.binaryFiles(), and then do some calculations, processing, and manipulations to get new objects (also

Re: How to restart Twitter spark stream

2015-07-22 Thread Akhil Das
. It would be great if somebody with experience on this could comment on these concerns. Thanks, Zoran On Mon, Jul 20, 2015 at 12:19 AM, Akhil Das ak...@sigmoidanalytics.com wrote: Jorn meant something like this: val filteredStream = twitterStream.transform(rdd ={ val newRDD

Re: use S3-Compatible Storage with spark

2015-07-21 Thread Akhil Das
where I have to store and reference the corresponding hadoop-aws-2.6.0.jar ?: java.io.IOException: No FileSystem for scheme: s3n 2015-07-20 8:33 GMT+02:00 Akhil Das ak...@sigmoidanalytics.com: Not in the uri, but in the hadoop configuration you can specify it. property namefs.s3a.endpoint

Re: spark streaming 1.3 issues

2015-07-21 Thread Akhil Das
I'd suggest you upgrading to 1.4 as it has better metrices and UI. Thanks Best Regards On Mon, Jul 20, 2015 at 7:01 PM, Shushant Arora shushantaror...@gmail.com wrote: Is coalesce not applicable to kafkaStream ? How to do coalesce on kafkadirectstream its not there in api ? Shall calling

Re: Apache Spark : spark.eventLog.dir on Windows Environment

2015-07-21 Thread Akhil Das
Do you have HADOOP_HOME, HADOOP_CONF_DIR and hadoop's winutils.exe in the environment? Thanks Best Regards On Mon, Jul 20, 2015 at 5:45 PM, nitinkalra2000 nitinkalra2...@gmail.com wrote: Hi All, I am working on Spark 1.4 on windows environment. I have to set eventLog directory so that I can

Re: What is the correct syntax of using Spark streamingContext.fileStream()?

2015-07-21 Thread Akhil Das
​Here's two ways of doing that: Without the filter function : JavaPairDStreamString, String foo = ssc.String, String, SequenceFileInputFormatfileStream(/tmp/foo);​ With the filter function: JavaPairInputDStreamLongWritable, Text foo = ssc.fileStream(/tmp/foo, LongWritable.class,

Re: use S3-Compatible Storage with spark

2015-07-21 Thread Akhil Das
(fs.s3n.endpoint,test.com ) And I continue to get my data from amazon, how could it be ? (I also use s3n in my text url) 2015-07-21 9:30 GMT+02:00 Akhil Das ak...@sigmoidanalytics.com: You can add the jar in the classpath, and you can set the property like: sc.hadoopConfiguration.set(fs.s3a.endpoint

Re: Apache Spark : spark.eventLog.dir on Windows Environment

2015-07-21 Thread Akhil Das
...@gmail.com wrote: Hi Akhil, I don't have HADOOP_HOME or HADOOP_CONF_DIR and even winutils.exe ? What's the configuration required for this ? From where can I get winutils.exe ? Thanks and Regards, Nitin Kalra On Tue, Jul 21, 2015 at 1:30 PM, Akhil Das ak...@sigmoidanalytics.com wrote

Re: k-means iteration not terminate

2015-07-21 Thread Akhil Das
It could be a GC pause or something, you need to check in the stages tab and see what is taking time, If you upgrade to Spark 1.4, it has better UI and DAG visualization which helps you debug better. Thanks Best Regards On Mon, Jul 20, 2015 at 8:21 PM, Pa Rö paul.roewer1...@googlemail.com wrote:

Re: use S3-Compatible Storage with spark

2015-07-20 Thread Akhil Das
) is assumed. /description /property Thanks Best Regards On Sun, Jul 19, 2015 at 9:13 PM, Schmirr Wurst schmirrwu...@gmail.com wrote: I want to use pithos, were do I can specify that endpoint, is it possible in the url ? 2015-07-19 17:22 GMT+02:00 Akhil Das ak...@sigmoidanalytics.com

Re: Exception while triggering spark job from remote jvm

2015-07-20 Thread Akhil Das
Just make sure there is no firewall/network blocking the requests as its complaining about timeout. Thanks Best Regards On Mon, Jul 20, 2015 at 1:14 AM, ankit tyagi ankittyagi.mn...@gmail.com wrote: Just to add more information. I have checked the status of this file, not a single block is

Re: How to restart Twitter spark stream

2015-07-20 Thread Akhil Das
Jorn meant something like this: val filteredStream = twitterStream.transform(rdd ={ val newRDD = scc.sc.textFile(/this/file/will/be/updated/frequently).map(x = (x,1)) rdd.join(newRDD) }) ​newRDD will work like a filter when you do the join.​ Thanks Best Regards On Sun, Jul 19, 2015 at 9:32

Re: use S3-Compatible Storage with spark

2015-07-19 Thread Akhil Das
Could you name the Storage service that you are using? Most of them provides a S3 like RestAPI endpoint for you to hit. Thanks Best Regards On Fri, Jul 17, 2015 at 2:06 PM, Schmirr Wurst schmirrwu...@gmail.com wrote: Hi, I wonder how to use S3 compatible Storage in Spark ? If I'm using

Re: Spark APIs memory usage?

2015-07-19 Thread Akhil Das
. (no matrices loaded), Same exception is coming. Can anyone tell what createDataFrame does internally? Are there any alternatives for it? On Fri, Jul 17, 2015 at 6:43 PM, Akhil Das ak...@sigmoidanalytics.com wrote: I suspect its the numpy filling up Memory. Thanks Best Regards On Fri

Re: streaming and piping to R, sending all data in window to pipe()

2015-07-19 Thread Akhil Das
Did you try inputs.repartition(1).foreachRDD(..)? Thanks Best Regards On Fri, Jul 17, 2015 at 9:51 PM, PAULI, KEVIN CHRISTIAN [AG-Contractor/1000] kevin.christian.pa...@monsanto.com wrote: Spark newbie here, using Spark 1.3.1. I’m consuming a stream and trying to pipe the data from the

Re: Spark APIs memory usage?

2015-07-17 Thread Akhil Das
Can you paste the code? How much memory does your system have and how big is your dataset? Did you try df.persist(StorageLevel.MEMORY_AND_DISK)? Thanks Best Regards On Fri, Jul 17, 2015 at 5:14 PM, Harit Vishwakarma harit.vishwaka...@gmail.com wrote: Thanks, Code is running on a single

Re: Spark APIs memory usage?

2015-07-17 Thread Akhil Das
= sqlCtx.createDataFrame(rdd2) 4. df.save() # in parquet format It throws exception in createDataFrame() call. I don't know what exactly it is creating ? everything in memory? or can I make it to persist simultaneously while getting created. Thanks On Fri, Jul 17, 2015 at 5:16 PM, Akhil Das ak

Re: DataFrame InsertIntoJdbc() Runtime Exception on cluster

2015-07-16 Thread Akhil Das
Which version of spark are you using? insertIntoJDBC is deprecated (from 1.4.0), you may use write.jdbc() instead. Thanks Best Regards On Wed, Jul 15, 2015 at 2:43 PM, Manohar753 manohar.re...@happiestminds.com wrote: Hi All, Am trying to add few new rows for existing table in mysql using

Re: RestSubmissionClient Basic Auth

2015-07-16 Thread Akhil Das
likely would it be that a change like that goes thru? Would it be rejected as an uncommon scenario? I really don't want to have this as a separate form of the branch. Thanks, Joel -- *From:* Akhil Das ak...@sigmoidanalytics.com *Sent:* Wednesday, July 15, 2015 2:07

Re: Job aborted due to stage failure: Task not serializable:

2015-07-16 Thread Akhil Das
Did you try this? *val out=lines.filter(xx={* val y=xx val x=broadcastVar.value var flag:Boolean=false for(a-x) { if(y.contains(a)) flag=true } flag } *})* Thanks Best Regards On Wed, Jul 15, 2015 at 8:10 PM, Naveen Dabas naveen.u...@ymail.com wrote: I

Re: Spark cluster read local files

2015-07-16 Thread Akhil Das
Yes you can do that, just make sure you rsync the same file to the same location on every machine. Thanks Best Regards On Thu, Jul 16, 2015 at 5:50 AM, Julien Beaudan jbeau...@stottlerhenke.com wrote: Hi all, Is it possible to use Spark to assign each machine in a cluster the same task, but

Re: Spark on EMR with S3 example (Python)

2015-07-15 Thread Akhil Das
I think any requests going to s3*:// requires the credentials. If they have made it public (via http) then you won't require the keys. Thanks Best Regards On Wed, Jul 15, 2015 at 2:26 AM, Pagliari, Roberto rpagli...@appcomsci.com wrote: Hi Sujit, I just wanted to access public datasets on

<    1   2   3   4   5   6   7   8   9   10   >