SFTP Compressed CSV into Dataframe

2016-03-02 Thread Benjamin Kim
I wonder if anyone has opened a SFTP connection to open a remote GZIP CSV file? I am able to download the file first locally using the SFTP Client in the spark-sftp package. Then, I load the file into a dataframe using the spark-csv package, which automatically decompresses the file. I just want

Re: Building a REST Service with Spark back-end

2016-03-02 Thread Benjamin Kim
I want to ask about something related to this. Does anyone know if there is or will be a command line equivalent of spark-shell client for Livy Spark Server or any other Spark Job Server? The reason that I am asking spark-shell does not handle multiple users on the same server well. Since a Spa

Re: SFTP Compressed CSV into Dataframe

2016-03-03 Thread Benjamin Kim
On Mar 2, 2016, at 11:17 AM, Benjamin Kim wrote: > > I wonder if anyone has opened a SFTP connection to open a remote GZIP CSV > file? I am able to download the file first locally using the SFTP Client in > the spark-sftp package. Then, I load the file into a dataframe using the >

Re: SFTP Compressed CSV into Dataframe

2016-03-03 Thread Benjamin Kim
ote: > > (-user) > > On Thursday 03 March 2016 10:09 PM, Benjamin Kim wrote: >> I forgot to mention that we will be scheduling this job using Oozie. So, we >> will not be able to know which worker node is going to being running this. >> If we try to do anything local, i

Re: Steps to Run Spark Scala job from Oozie on EC2 Hadoop clsuter

2016-03-07 Thread Benjamin Kim
To comment… At my company, we have not gotten it to work in any other mode than local. If we try any of the yarn modes, it fails with a “file does not exist” error when trying to locate the executable jar. I mentioned this to the Hue users group, which we used for this, and they replied that th

S3 Zip File Loading Advice

2016-03-08 Thread Benjamin Kim
I am wondering if anyone can help. Our company stores zipped CSV files in S3, which has been a big headache from the start. I was wondering if anyone has created a way to iterate through several subdirectories (s3n://events/2016/03/01/00, s3n://2016/03/01/01, etc.) in S3 to find the newest file

Re: S3 Zip File Loading Advice

2016-03-09 Thread Benjamin Kim
h zip? Single file archives are processed just > like text as long as it is one of the supported compression formats. > > Regards > Sab > > On Wed, Mar 9, 2016 at 10:33 AM, Benjamin Kim <mailto:bbuil...@gmail.com>> wrote: > I am wondering if anyone can help. > >

Re: Spark Job on YARN accessing Hbase Table

2016-03-13 Thread Benjamin Kim
Hi Ted, I see that you’re working on the hbase-spark module for hbase. I recently packaged the SparkOnHBase project and gave it a test run. It works like a charm on CDH 5.4 and 5.5. All I had to do was add /opt/cloudera/parcels/CDH/jars/htrace-core-3.1.0-incubating.jar to the classpath.txt fil

Re: Spark Job on YARN accessing Hbase Table

2016-03-13 Thread Benjamin Kim
to root pom.xml: > hbase-spark > > Then you would be able to build the module yourself. > > hbase-spark module uses APIs which are compatible with hbase 1.0 > > Cheers > > On Sun, Mar 13, 2016 at 11:39 AM, Benjamin Kim <mailto:bbuil...@gmail.com>> wrote: &g

Re: Spark Job on YARN accessing Hbase Table

2016-03-13 Thread Benjamin Kim
s > > On Sun, Mar 13, 2016 at 11:39 AM, Benjamin Kim <mailto:bbuil...@gmail.com>> wrote: > Hi Ted, > > I see that you’re working on the hbase-spark module for hbase. I recently > packaged the SparkOnHBase project and gave it a test run. It works like a > charm on CDH

Re: Spark Job on YARN accessing Hbase Table

2016-03-13 Thread Benjamin Kim
compressionByName() resides in class with @InterfaceAudience.Private which > got moved in master branch. > > So looks like there is some work to be done for backporting to branch-1 :-) > > On Sun, Mar 13, 2016 at 1:35 PM, Benjamin Kim <mailto:bbuil...@gmail.com>> wrote: &

Re: S3 Zip File Loading Advice

2016-03-15 Thread Benjamin Kim
ould you wrap the ZipInputStream in a List, since a subtype of > TraversableOnce[?] is required? > > case (name, content) => List(new ZipInputStream(content.open)) > > Xinh > > On Wed, Mar 9, 2016 at 7:07 AM, Benjamin Kim <mailto:bbuil...@gmail.com>> wrote: >

Re: new object store driver for Spark

2016-03-22 Thread Benjamin Kim
Hi Gil, Currently, our company uses S3 heavily for data storage. Can you further explain the benefits of this in relation to S3 when the pending patch does come out? Also, I have heard of Swift from others. Can you explain to me the pros and cons of Swift compared to HDFS? It can be just a brie

BinaryFiles to ZipInputStream

2016-03-23 Thread Benjamin Kim
I need a little help. I am loading into Spark 1.6 zipped csv files stored in s3. First of all, I am able to get the List of file keys that have a modified date within a range of time by using the AWS SDK Objects (AmazonS3Client, ObjectListing, S3ObjectSummary, ListObjectsRequest, GetObjectReques

Does Spark CSV accept a CSV String

2016-03-30 Thread Benjamin Kim
I have a quick question. I have downloaded multiple zipped files from S3 and unzipped each one of them into strings. The next step is to parse using a CSV parser. I want to know if there is a way to easily use the spark csv package for this? Thanks, Ben -

Re: Does Spark CSV accept a CSV String

2016-03-30 Thread Benjamin Kim
lebzadeh > > LinkedIn > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw> > > http://talebzadehmich.wordpress.com <http://talebzadehmi

Re: Does Spark CSV accept a CSV String

2016-03-30 Thread Benjamin Kim
7;),'-MM-dd')) > AS TransactionDate > , TransactionType > , Description > , Value > , Balance > , AccountName > , AccountNumber > FROM tmp > """ > sql(sqltext) > > println ("\nFinished at&quo

can spark-csv package accept strings instead of files?

2016-04-01 Thread Benjamin Kim
Does anyone know if this is possible? I have an RDD loaded with rows of CSV data strings. Each string representing the header row and multiple rows of data along with delimiters. I would like to feed each thru a CSV parser to convert the data into a dataframe and, ultimately, UPSERT a Hive/HBase

Monitoring S3 Bucket with Spark Streaming

2016-04-08 Thread Benjamin Kim
Has anyone monitored an S3 bucket or directory using Spark Streaming and pulled any new files to process? If so, can you provide basic Scala coding help on this? Thanks, Ben - To unsubscribe, e-mail: user-unsubscr...@spark.apa

Re: Monitoring S3 Bucket with Spark Streaming

2016-04-09 Thread Benjamin Kim
ion.set("fs.s3n.awsAccessKeyId", > AccessKeyId) > ssc.sparkContext.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", > AWSSecretAccessKey) > > val inputS3Stream = ssc.textFileStream("s3://example_bucket/folder") > > This code will probe for new S3 files created i

Re: Monitoring S3 Bucket with Spark Streaming

2016-04-09 Thread Benjamin Kim
sKey", > AWSSecretAccessKey) > > val inputS3Stream = ssc.textFileStream("s3://example_bucket/folder") > > This code will probe for new S3 files created in your every batch interval. > > Thanks, > Natu > > On Fri, Apr 8, 2016 at 9:14 PM, Benja

Re: Monitoring S3 Bucket with Spark Streaming

2016-04-09 Thread Benjamin Kim
t; Sent from my iPhone > > On Apr 9, 2016, at 9:55 AM, Benjamin Kim <mailto:bbuil...@gmail.com>> wrote: > >> Nezih, >> >> This looks like a good alternative to having the Spark Streaming job check >> for new files on its own. Do you know if there is a wa

Re: Monitoring S3 Bucket with Spark Streaming

2016-04-09 Thread Benjamin Kim
endpoint of this notification. This would then convey to a listening Spark Streaming job the file information to download. I like this! Cheers, Ben > On Apr 9, 2016, at 9:54 AM, Benjamin Kim wrote: > > This is awesome! I have someplace to start from. > > Thanks, > Ben

Re: Monitoring S3 Bucket with Spark Streaming

2016-04-09 Thread Benjamin Kim
, please let me know. Thanks, Ben > On Apr 9, 2016, at 2:49 PM, Benjamin Kim wrote: > > This was easy! > > I just created a notification on a source S3 bucket to kick off a Lambda > function that would decompress the dropped file and save it to another S3 > bucket. In return,

Re: Monitoring S3 Bucket with Spark Streaming

2016-04-12 Thread Benjamin Kim
"true") // Automatically infer data types .load("s3://" + bucket + "/" + key) //save to hbase }) ssc.checkpoint(checkpointDirectory) // set checkpoint directory ssc } Thanks, Ben > On Apr 9, 2016, at 6:12 PM, Benjamin Kim wrote: > &g

JSON Usage

2016-04-14 Thread Benjamin Kim
I was wonder what would be the best way to use JSON in Spark/Scala. I need to lookup values of fields in a collection of records to form a URL and download that file at that location. I was thinking an RDD would be perfect for this. I just want to hear from others who might have more experience

Re: can spark-csv package accept strings instead of files?

2016-04-15 Thread Benjamin Kim
> https://github.com/databricks/spark-csv/blob/master/src/main/scala/com/databricks/spark/csv/CsvParser.scala#L150 > > <https://github.com/databricks/spark-csv/blob/master/src/main/scala/com/databricks/spark/csv/CsvParser.scala#L150>. > > Thanks! > > On 2 Apr 2016 2:47

Re: JSON Usage

2016-04-15 Thread Benjamin Kim
Karau wrote: > > You could certainly use RDDs for that, you might also find using Dataset > selecting the fields you need to construct the URL to fetch and then using > the map function to be easier. > > On Thu, Apr 14, 2016 at 12:01 PM, Benjamin Kim <mailto:bbuil...@gmail.

Re: can spark-csv package accept strings instead of files?

2016-04-15 Thread Benjamin Kim
codes below? > > val csvRDD = ...your processimg for csv rdd.. > val df = new CsvParser().csvRdd(sqlContext, csvRDD, useHeader = true) > > Thanks! > > On 16 Apr 2016 1:35 a.m., "Benjamin Kim" <mailto:bbuil...@gmail.com>> wrote: > Hi Hyukjin, > &g

Re: can spark-csv package accept strings instead of files?

2016-04-15 Thread Benjamin Kim
you try this codes below? > > val csvRDD = ...your processimg for csv rdd.. > val df = new CsvParser().csvRdd(sqlContext, csvRDD, useHeader = true) > > Thanks! > > On 16 Apr 2016 1:35 a.m., "Benjamin Kim" <mailto:bbuil...@gmail.com>> wrote: > Hi Hyukjin, &

Re: JSON Usage

2016-04-17 Thread Benjamin Kim
d > I create it based on the JSON structure below, especially the nested elements. > > Thanks, > Ben > > >> On Apr 14, 2016, at 3:46 PM, Holden Karau > <mailto:hol...@pigscanfly.ca>> wrote: >> >> You could certainly use RDDs for that, you might

HBase Spark Module

2016-04-20 Thread Benjamin Kim
I see that the new CDH 5.7 has been release with the HBase Spark module built-in. I was wondering if I could just download it and use the hbase-spark jar file for CDH 5.5. Has anyone tried this yet? Thanks, Ben - To unsubscribe,

Save DataFrame to HBase

2016-04-21 Thread Benjamin Kim
Has anyone found an easy way to save a DataFrame into HBase? Thanks, Ben - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

Re: Save DataFrame to HBase

2016-04-21 Thread Benjamin Kim
Thu, Apr 21, 2016 at 6:52 AM, Benjamin Kim <mailto:bbuil...@gmail.com>> wrote: > Has anyone found an easy way to save a DataFrame into HBase? > > Thanks, > Ben > > > - > To unsubscribe, e-mail

Re: Save DataFrame to HBase

2016-04-24 Thread Benjamin Kim
able with hbase storage handler and > hiveContext but it failed due to a bug. > > I was able to persist the DF to hbase using Apache Pheonix which was pretty > simple. > > Thank you. > Daniel > > On 21 Apr 2016, at 16:52, Benjamin Kim <mailto:bbuil...@gmail.com&g

Convert DataFrame to Array of Arrays

2016-04-24 Thread Benjamin Kim
I have data in a DataFrame loaded from a CSV file. I need to load this data into HBase using an RDD formatted in a certain way. val rdd = sc.parallelize( Array(key1, (ColumnFamily, ColumnName1, Value1), (ColumnFamily, ColumnName2, Value2), (

Re: Save DataFrame to HBase

2016-04-27 Thread Benjamin Kim
i Benjamin, > Yes it should work. > > Let me know if you need further assistance I might be able to get the code > I've used for that project. > > Thank you. > Daniel > > On 24 Apr 2016, at 17:35, Benjamin Kim <mailto:bbuil...@gmail.com>> wrote: > &g

Re: Save DataFrame to HBase

2016-04-27 Thread Benjamin Kim
? Thanks, Ben > On Apr 21, 2016, at 6:56 AM, Ted Yu wrote: > > The hbase-spark module in Apache HBase (coming with hbase 2.0 release) can do > this. > > On Thu, Apr 21, 2016 at 6:52 AM, Benjamin Kim <mailto:bbuil...@gmail.com>> wrote: > Has anyone found an easy way

Spark 2.0+ Structured Streaming

2016-04-28 Thread Benjamin Kim
Can someone explain to me how the new Structured Streaming works in the upcoming Spark 2.0+? I’m a little hazy how data will be stored and referenced if it can be queried and/or batch processed directly from streams and if the data will be append only to or will there be some sort of upsert capa

Re: Spark 2.0 Release Date

2016-04-28 Thread Benjamin Kim
Next Thursday is Databricks' webinar on Spark 2.0. If you are attending, I bet many are going to ask when the release will be. Last time they did this, Spark 1.6 came out not too long afterward. > On Apr 28, 2016, at 5:21 AM, Sean Owen wrote: > > I don't know if anyone has begun a firm discuss

Re: Save DataFrame to HBase

2016-05-10 Thread Benjamin Kim
gt; Cheers > > On Apr 27, 2016, at 10:31 PM, Benjamin Kim <mailto:bbuil...@gmail.com>> wrote: > >> Hi Ted, >> >> Do you know when the release will be? I also see some documentation for >> usage of the hbase-spark module at the hbase website. But, I d

Re: Structured Streaming in Spark 2.0 and DStreams

2016-05-15 Thread Benjamin Kim
Hi Ofir, I just recently saw the webinar with Reynold Xin. He mentioned the Spark Session unification efforts, but I don’t remember the DataSet for Structured Streaming aka Continuous Applications as he put it. He did mention streaming or unlimited DataFrames for Structured Streaming so one can

Re: Structured Streaming in Spark 2.0 and DStreams

2016-05-15 Thread Benjamin Kim
obile: +972-54-7801286 | Email: > ofir.ma...@equalum.io <mailto:ofir.ma...@equalum.io> > On Sun, May 15, 2016 at 11:58 PM, Benjamin Kim <mailto:bbuil...@gmail.com>> wrote: > Hi Ofir, > > I just recently saw the webinar with Reynold Xin. He mentioned the Spark

Re: Structured Streaming in Spark 2.0 and DStreams

2016-05-16 Thread Benjamin Kim
I have a curiosity question. These forever/unlimited DataFrames/DataSets will persist and be query capable. I still am foggy about how this data will be stored. As far as I know, memory is finite. Will the data be spilled to disk and be retrievable if the query spans data not in memory? Is Tachy

Spark Streaming S3 Error

2016-05-20 Thread Benjamin Kim
I am trying to stream files from an S3 bucket using CDH 5.7.0’s version of Spark 1.6.0. It seems not to work. I keep getting this error. Exception in thread "JobGenerator" java.lang.VerifyError: Bad type on operand stack Exception Details: Location: org/apache/hadoop/fs/s3native/Jets3tNat

Re: Spark Streaming S3 Error

2016-05-21 Thread Benjamin Kim
could be wrong. Thanks, Ben > On May 21, 2016, at 4:18 AM, Ted Yu wrote: > > Maybe more than one version of jets3t-xx.jar was on the classpath. > > FYI > > On Fri, May 20, 2016 at 8:31 PM, Benjamin Kim <mailto:bbuil...@gmail.com>> wrote: > I am trying to stream

Re: Spark Streaming S3 Error

2016-05-21 Thread Benjamin Kim
Ben > On May 21, 2016, at 4:18 AM, Ted Yu wrote: > > Maybe more than one version of jets3t-xx.jar was on the classpath. > > FYI > > On Fri, May 20, 2016 at 8:31 PM, Benjamin Kim <mailto:bbuil...@gmail.com>> wrote: > I am trying to stream files from an S3 buck

Save to a Partitioned Table using a Derived Column

2016-06-03 Thread Benjamin Kim
Does anyone know how to save data in a DataFrame to a table partitioned using an existing column reformatted into a derived column? val partitionedDf = df.withColumn("dt", concat(substring($"timestamp", 1, 10), lit(" "), substring($"timestamp", 12, 2), lit(":00")))

Re: Save to a Partitioned Table using a Derived Column

2016-06-03 Thread Benjamin Kim
;, `os_name` string COMMENT '', `os_version` string COMMENT '', `os_major_version` string COMMENT '',

Re: Save to a Partitioned Table using a Derived Column

2016-06-03 Thread Benjamin Kim
t; > Dr Mich Talebzadeh > > LinkedIn > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw> > > http://talebzadehmich.wordpress.com <http://tal

Re: Save to a Partitioned Table using a Derived Column

2016-06-03 Thread Benjamin Kim
browser_major_version string > browser_minor_version string > os_family string > os_name string > os_version string > os_major_versionstring > os_minor_versionstring > # Partition Information > # col_name

Data Integrity / Model Quality Monitoring

2016-06-17 Thread Benjamin Kim
Has anyone run into this requirement? We have a need to track data integrity and model quality metrics of outcomes so that we can both gauge if the data is healthy coming in and the models run against them are still performing and not giving faulty results. A nice to have would be to graph thes

Model Quality Tracking

2016-06-24 Thread Benjamin Kim
Has anyone implemented a way to track the performance of a data model? We currently have an algorithm to do record linkage and spit out statistics of matches, non-matches, and/or partial matches with reason codes of why we didn’t match accurately. In this way, we will know if something goes wron

Kudu Connector

2016-06-29 Thread Benjamin Kim
I was wondering if anyone, who is a Spark Scala developer, would be willing to continue the work done for the Kudu connector? https://github.com/apache/incubator-kudu/tree/master/java/kudu-spark/src/main/scala/org/kududb/spark/kudu I have been testing and using Kudu for the past month and compar

SnappyData and Structured Streaming

2016-07-05 Thread Benjamin Kim
I recently got a sales email from SnappyData, and after reading the documentation about what they offer, it sounds very similar to what Structured Streaming will offer w/o the underlying in-memory, spill-to-disk, CRUD compliant data storage in SnappyData. I was wondering if Structured Streaming

Re: SnappyData and Structured Streaming

2016-07-06 Thread Benjamin Kim
> Jags > SnappyData blog <http://www.snappydata.io/blog> > Download binary, source <https://github.com/SnappyDataInc/snappydata> > > > On Wed, Jul 6, 2016 at 12:49 AM, Benjamin Kim <mailto:bbuil...@gmail.com>> wrote: > I recently got a sales email from Sna

Re: SnappyData and Structured Streaming

2016-07-06 Thread Benjamin Kim
requencyCol 'retweets', timeSeriesColumn > 'tweetTime' )" > where 'tweetStreamTable' is created using the 'create stream table ...' SQL > syntax. > > > - > Jags > SnappyData blog <http://www.snappydata.io/blog> > D

Spark Website

2016-07-13 Thread Benjamin Kim
Has anyone noticed that the spark.apache.org is not working as supposed to? - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Spark Website

2016-07-13 Thread Benjamin Kim
It takes me to the directories instead of the webpage. > On Jul 13, 2016, at 11:45 AM, manish ranjan wrote: > > working for me. What do you mean 'as supposed to'? > > ~Manish > > > > On Wed, Jul 13, 2016 at 11:45 AM, Benjamin Kim <mailto:bbuil...

Re: transtition SQLContext to SparkSession

2016-07-18 Thread Benjamin Kim
From what I read, there is no more Contexts. "SparkContext, SQLContext, HiveContext merged into SparkSession" I have not tested it, but I don’t know if it’s true. Cheers, Ben > On Jul 18, 2016, at 8:37 AM, Koert Kuipers wrote: > > in my codebase i would like to gradually transition t

Re: How to connect HBase and Spark using Python?

2016-07-22 Thread Benjamin Kim
It is included in Cloudera’s CDH 5.8. > On Jul 22, 2016, at 6:13 PM, Mail.com wrote: > > Hbase Spark module will be available with Hbase 2.0. Is that out yet? > >> On Jul 22, 2016, at 8:50 PM, Def_Os wrote: >> >> So it appears it should be possible to use HBase's new hbase-spark module, if >>

HBase-Spark Module

2016-07-29 Thread Benjamin Kim
I would like to know if anyone has tried using the hbase-spark module? I tried to follow the examples in conjunction with CDH 5.8.0. I cannot find the HBaseTableCatalog class in the module or in any of the Spark jars. Can someone help? Thanks, Ben ---

Re: Spark SQL 1.5.2 missing JDBC driver for PostgreSQL?

2015-12-22 Thread Benjamin Kim
Hi Stephen, I forgot to mention that I added these lines below to the spark-default.conf on the node with Spark SQL Thrift JDBC/ODBC Server running on it. Then, I restarted it. spark.driver.extraClassPath=/usr/share/java/postgresql-9.3-1104.jdbc41.jar spark.executor.extraClassPath=/usr/share/ja

Re: Spark SQL 1.5.2 missing JDBC driver for PostgreSQL?

2015-12-22 Thread Benjamin Kim
18:35 GMT-08:00 Benjamin Kim >: > >> Hi Stephen, >> >> I forgot to mention that I added these lines below to the >> spark-default.conf on the node with Spark SQL Thrift JDBC/ODBC Server >> running on it. Then, I restarted it. >> >> spark.drive

Re: Spark SQL 1.5.2 missing JDBC driver for PostgreSQL?

2015-12-25 Thread Benjamin Kim
e spark.worker.cleanup.appDataTtl config param. > > The Spark SQL programming guide says to use SPARK_CLASSPATH for this purpose, > but I couldn't get this to work for whatever reason, so i'm sticking to the > --jars approach used in my examples. > > On Tue, Dec 22,

Re: Spark SQL 1.5.2 missing JDBC driver for PostgreSQL?

2015-12-26 Thread Benjamin Kim
t; but I couldn't get this to work for whatever reason, so i'm sticking to the > --jars approach used in my examples. > > On Tue, Dec 22, 2015 at 9:51 PM, Benjamin Kim <mailto:bbuil...@gmail.com>> wrote: > Stephen, > > Let me confirm. I just need to propagat

Re: [ANNOUNCE] New SAMBA Package = Spark + AWS Lambda

2016-02-02 Thread Benjamin Kim
Hi David, My company uses Lamba to do simple data moving and processing using python scripts. I can see using Spark instead for the data processing would make it into a real production level platform. Does this pave the way into replacing the need of a pre-instantiated cluster in AWS or bought

Re: Spark with SAS

2016-02-03 Thread Benjamin Kim
You can download the Spark ODBC Driver. https://databricks.com/spark/odbc-driver-download > On Feb 3, 2016, at 10:09 AM, Jörn Franke wrote: > > This could be done through odbc. Keep in mind that you can run SaS jobs > directly on a Hadoop cluster using the SaS embedded process engine or dump

Re: Is there a any plan to develop SPARK with c++??

2016-02-03 Thread Benjamin Kim
Hi DaeJin, The closest thing I can think of is this. https://databricks.com/blog/2015/04/28/project-tungsten-bringing-spark-closer-to-bare-metal.html Cheers, Ben > On Feb 3, 2016, at 9:49 PM, DaeJin Jung wrote: > > hello everyone, > I have a short question. > > I would like to improve perfor

Re: spark 1.6.0 connect to hive metastore

2016-02-09 Thread Benjamin Kim
I got the same problem when I added the Phoenix plugin jar in the driver and executor extra classpaths. Do you have those set too? > On Feb 9, 2016, at 1:12 PM, Koert Kuipers wrote: > > yes its not using derby i think: i can see the tables in my actual hive > metastore. > > i was using a syml

Re: SparkOnHBase : Which version of Spark its available

2016-02-17 Thread Benjamin Kim
Ted, Any idea as to when this will be released? Thanks, Ben > On Feb 17, 2016, at 2:53 PM, Ted Yu wrote: > > The HBASE JIRA below is for HBase 2.0 > > HBase Spark module would be back ported to hbase 1.3.0 > > FYI > > On Feb 17, 2016, at 1:13 PM, Chandeep Singh

Spark 1.6 Streaming with Checkpointing

2016-08-26 Thread Benjamin Kim
I am trying to implement checkpointing in my streaming application but I am getting a not serializable error. Has anyone encountered this? I am deploying this job in YARN clustered mode. Here is a snippet of the main parts of the code. object S3EventIngestion { //create and setup streaming

Spark SQL Tables on top of HBase Tables

2016-09-02 Thread Benjamin Kim
I was wondering if anyone has tried to create Spark SQL tables on top of HBase tables so that data in HBase can be accessed using Spark Thriftserver with SQL statements? This is similar what can be done using Hive. Thanks, Ben ---

Re: Spark SQL Tables on top of HBase Tables

2016-09-03 Thread Benjamin Kim
s.com/> > > Disclaimer: Use it at your own risk. Any and all responsibility for any loss, > damage or destruction of data or any other property which may arise from > relying on this email's technical content is explicitly disclaimed. The > author will in no case be liable fo

Re: Spark SQL Tables on top of HBase Tables

2016-09-03 Thread Benjamin Kim
may arise from > relying on this email's technical content is explicitly disclaimed. The > author will in no case be liable for any monetary damages arising from such > loss, damage or destruction. > > > On 3 September 2016 at 20:31, Benjamin Kim <mailto:bbuil...@gm

Re: Spark Metrics: custom source/sink configurations not getting recognized

2016-09-06 Thread Benjamin Kim
We use Graphite/Grafana for custom metrics. We found Spark’s metrics not to be customizable. So, we write directly using Graphite’s API, which was very easy to do using Java’s socket library in Scala. It works great for us, and we are going one step further using Sensu to alert us if there is an

Spark SQL Thriftserver

2016-09-13 Thread Benjamin Kim
Does anyone have any thoughts about using Spark SQL Thriftserver in Spark 1.6.2 instead of HiveServer2? We are considering abandoning HiveServer2 for it. Some advice and gotcha’s would be nice to know. Thanks, Ben - To unsubscri

Re: Spark SQL Thriftserver

2016-09-13 Thread Benjamin Kim
own risk. Any and all responsibility for any loss, > damage or destruction of data or any other property which may arise from > relying on this email's technical content is explicitly disclaimed. The > author will in no case be liable for any monetary damages arising from such

Using Spark SQL to Create JDBC Tables

2016-09-13 Thread Benjamin Kim
Has anyone created tables using Spark SQL that directly connect to a JDBC data source such as PostgreSQL? I would like to use Spark SQL Thriftserver to access and query remote PostgreSQL tables. In this way, we can centralize data access to Spark SQL tables along with PostgreSQL making it very c

Re: Using Spark SQL to Create JDBC Tables

2016-09-13 Thread Benjamin Kim
> tables which "point to" any other DB. i know Oracle provides there own Serde > for hive. Not sure about PG though. > > Once tables are created in hive, STS will automatically see it. > > On Wed, Sep 14, 2016 at 11:08 AM, Benjamin Kim <mailto:bbuil...@gmail.

JDBC Very Slow

2016-09-16 Thread Benjamin Kim
Has anyone using Spark 1.6.2 encountered very slow responses from pulling data from PostgreSQL using JDBC? I can get to the table and see the schema, but when I do a show, it takes very long or keeps timing out. The code is simple. val jdbcDF = sqlContext.read.format("jdbc").options( Map("u

Re: JDBC Very Slow

2016-09-16 Thread Benjamin Kim
. Thanks, Ben > On Sep 16, 2016, at 3:29 PM, Nikolay Zhebet wrote: > > Hi! Can you split init code with current comand? I thing it is main problem > in your code. > > 16 сент. 2016 г. 8:26 PM пользователь "Benjamin Kim" <mailto:bbuil...@gmail.com>> на

Re: Loading data into Hbase table throws NoClassDefFoundError: org/apache/htrace/Trace error

2016-10-01 Thread Benjamin Kim
Mich, I know up until CDH 5.4 we had to add the HTrace jar to the classpath to make it work using the command below. But after upgrading to CDH 5.7, it became unnecessary. echo "/opt/cloudera/parcels/CDH/jars/htrace-core-3.2.0-incubating.jar" >> /etc/spark/conf/classpath.txt Hope this helps.

Re: Loading data into Hbase table throws NoClassDefFoundError: org/apache/htrace/Trace error

2016-10-02 Thread Benjamin Kim
any other property which may arise from > relying on this email's technical content is explicitly disclaimed. The > author will in no case be liable for any monetary damages arising from such > loss, damage or destruction. > > > On 1 October 2016 at 23:39, Benjamin Kim &

Re: Loading data into Hbase table throws NoClassDefFoundError: org/apache/htrace/Trace error

2016-10-03 Thread Benjamin Kim
COLUMN+CELL > Tesco PLC > column=stock_daily:close, timestamp=1475447365118, value=325.25 > Tesco PLC > column=stock_daily:high, timestamp=1475447365118, value=332.00 > Tesc

Re: Loading data into Hbase table throws NoClassDefFoundError: org/apache/htrace/Trace error

2016-10-03 Thread Benjamin Kim
gt; That sounds interesting, would love to learn more about it. > > Mitch: looks good. Lastly I would suggest you to think if you really need > multiple column families. > > On 4 Oct 2016 02:57, "Benjamin Kim" <mailto:bbuil...@gmail.com>> wrote: > Lately, I

Re: Deep learning libraries for scala

2016-10-03 Thread Benjamin Kim
I got this email a while back in regards to this. Dear Spark users and developers, I have released version 1.0.0 of scalable-deeplearning package. This package is based on the implementation of artificial neural networks in Spark ML. It is intended for new Spark deep learning features that wer

RESTful Endpoint and Spark

2016-10-06 Thread Benjamin Kim
Has anyone tried to integrate Spark with a server farm of RESTful API endpoints or even HTTP web-servers for that matter? I know it’s typically done using a web farm as the presentation interface, then data flows through a firewall/router to direct calls to a JDBC listener that will SELECT, INSE

Re: RESTful Endpoint and Spark

2016-10-07 Thread Benjamin Kim
On Oct 6, 2016, at 4:27 PM, Benjamin Kim wrote: >> >> Has anyone tried to integrate Spark with a server farm of RESTful API >> endpoints or even HTTP web-servers for that matter? I know it’s typically >> done using a web farm as the presentation interface, then data flows thro

Spark SQL Thriftserver with HBase

2016-10-07 Thread Benjamin Kim
Does anyone know if Spark can work with HBase tables using Spark SQL? I know in Hive we are able to create tables on top of an underlying HBase table that can be accessed using MapReduce jobs. Can the same be done using HiveContext or SQLContext? We are trying to setup a way to GET and POST data

Inserting New Primary Keys

2016-10-08 Thread Benjamin Kim
I have a table with data already in it that has primary keys generated by the function monotonicallyIncreasingId. Now, I want to insert more data into it with primary keys that will auto-increment from where the existing data left off. How would I do this? There is no argument I can pass into th

Re: Spark SQL Thriftserver with HBase

2016-10-08 Thread Benjamin Kim
book.html#spark> > > And if you search you should find several alternative approaches. > > > > > > On Fri, Oct 7, 2016 at 7:56 AM -0700, "Benjamin Kim" <mailto:bbuil...@gmail.com>> wrote: > > Does anyone know if Spark can work with HBase tab

Re: Inserting New Primary Keys

2016-10-08 Thread Benjamin Kim
damage or destruction of data or any other property which may arise from > relying on this email's technical content is explicitly disclaimed. The > author will in no case be liable for any monetary damages arising from such > loss, damage or destruction. > > > On 8 Octo

Re: Spark SQL Thriftserver with HBase

2016-10-08 Thread Benjamin Kim
Thrift Server (with USING, > http://spark.apache.org/docs/latest/sql-programming-guide.html#tab_sql_10 > <http://spark.apache.org/docs/latest/sql-programming-guide.html#tab_sql_10>). > > > _ > From: Benjamin Kim mailto:bbuil...@gmail.com>&

Re: Spark SQL Thriftserver with HBase

2016-10-08 Thread Benjamin Kim
experience with this! > > > _____ > From: Benjamin Kim mailto:bbuil...@gmail.com>> > Sent: Saturday, October 8, 2016 11:00 AM > Subject: Re: Spark SQL Thriftserver with HBase > To: Felix Cheung <mailto:felixcheun...@hotmail.com>> > Cc: m

Re: Spark SQL Thriftserver with HBase

2016-10-08 Thread Benjamin Kim
e it at your own risk. Any and all responsibility for any loss, > damage or destruction of data or any other property which may arise from > relying on this email's technical content is explicitly disclaimed. The > author will in no case be liable for any monetary damages arising from such &

Re: Spark SQL Thriftserver with HBase

2016-10-08 Thread Benjamin Kim
aming specifics, there are at least 4 or 5 different implementations > of HBASE sources, each at varying level of development and different > requirements (HBASE release version, Kerberos support etc) > > > _ > From: Benjamin Kim mailto:bbuil...

Re: Spark SQL Thriftserver with HBase

2016-10-08 Thread Benjamin Kim
responsibility for any loss, > damage or destruction of data or any other property which may arise from > relying on this email's technical content is explicitly disclaimed. The > author will in no case be liable for any monetary damages arising from such > loss, damage o

Re: Spark SQL Thriftserver with HBase

2016-10-09 Thread Benjamin Kim
ll provide an in-memory cache for interactive analytics. You > can put full tables in-memory with Hive using Ignite HDFS in-memory solution. > All this does only make sense if you do not use MR as an engine, the right > input format (ORC, parquet) and a recent Hive version. > >

Re: Inserting New Primary Keys

2016-10-10 Thread Benjamin Kim
Is there only one process adding rows? because this seems a little risky if > you have multiple threads doing that… > >> On Oct 8, 2016, at 1:43 PM, Benjamin Kim > <mailto:bbuil...@gmail.com>> wrote: >> >> Mich, >> >> After much searching, I

  1   2   >