Re: Loading data into Hbase table throws NoClassDefFoundError: org/apache/htrace/Trace error

2016-10-03 Thread Mich Talebzadeh
Hi *be*njamin, How stable is Kudu? Is it production ready? Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw * http://talebzadehmich.wordp

Re: Loading data into Hbase table throws NoClassDefFoundError: org/apache/htrace/Trace error

2016-10-03 Thread Benjamin Kim
If you’re interested, here is the link to the development page for Kudu. It has the Spark code snippets using DataFrames. http://kudu.apache.org/docs/developing.html Cheers, Ben > On Oct 3, 2016, at 9:56 AM, ayan guha wrote: > > That sounds inter

Re: Loading data into Hbase table throws NoClassDefFoundError: org/apache/htrace/Trace error

2016-10-03 Thread ayan guha
That sounds interesting, would love to learn more about it. Mitch: looks good. Lastly I would suggest you to think if you really need multiple column families. On 4 Oct 2016 02:57, "Benjamin Kim" wrote: > Lately, I’ve been experimenting with Kudu. It has been a much better > experience than with

Re: Loading data into Hbase table throws NoClassDefFoundError: org/apache/htrace/Trace error

2016-10-03 Thread Benjamin Kim
Lately, I’ve been experimenting with Kudu. It has been a much better experience than with HBase. Using it is much simpler, even from spark-shell. spark-shell --packages org.apache.kudu:kudu-spark_2.10:1.0.0 It’s like going back to rudimentary DB systems where tables have just a primary key and

Re: Loading data into Hbase table throws NoClassDefFoundError: org/apache/htrace/Trace error

2016-10-03 Thread Mich Talebzadeh
with ticker+date I can c reate something like below for row key TSCO_1-Apr-08 or TSCO1-Apr-08 if I understood you correctly Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Re: Loading data into Hbase table throws NoClassDefFoundError: org/apache/htrace/Trace error

2016-10-03 Thread ayan guha
Hi Looks like you are saving to new.csv but still loading tsco.csv? Its definitely the header. Suggestion: ticker+date as row key has following benefits: 1. using ticker+date as row key will enable you to hold multiple ticker in this single hbase table. (Think composite primary key) 2. Using dat

Re: Loading data into Hbase table throws NoClassDefFoundError: org/apache/htrace/Trace error

2016-10-03 Thread Mich Talebzadeh
Hi Ayan, Sounds like the row key has to be unique much like a primary key in RDBMS This is what I download as a csv for stock from Google Finance Date Open High Low Close Volume 27-Sep-16 177.4 177.75 172.5 177.75 24117196 So What I do I add the stock and ticker myself to end of the row via

Re: Loading data into Hbase table throws NoClassDefFoundError: org/apache/htrace/Trace error

2016-10-03 Thread ayan guha
I am not well versed with importtsv, but you can create a CSV file using a simple spark program to create first column as ticker+tradedate. I remember doing similar manipulation to create row key format in pig. On 3 Oct 2016 20:40, "Mich Talebzadeh" wrote: > Thanks Ayan, > > How do you specify ti

Re: Loading data into Hbase table throws NoClassDefFoundError: org/apache/htrace/Trace error

2016-10-03 Thread Mich Talebzadeh
Thanks Ayan, How do you specify ticker+rtrade as row key in the below hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=',' -Dimporttsv.columns="HBASE_ROW_KEY, stock_daily:ticker, stock_daily:tradedate, stock_daily:open,stock_daily:high,stock_daily:low,stock_daily:close,stoc

Re: Loading data into Hbase table throws NoClassDefFoundError: org/apache/htrace/Trace error

2016-10-03 Thread ayan guha
Hi Mitch It is more to do with hbase than spark. Row key can be anything, yes but essentially what you are doing is insert and update tesco PLC row. Given your schema, ticker+trade date seems to be a good row key On 3 Oct 2016 18:25, "Mich Talebzadeh" wrote: > thanks again. > > I added that jar

Re: Loading data into Hbase table throws NoClassDefFoundError: org/apache/htrace/Trace error

2016-10-03 Thread Mich Talebzadeh
thanks again. I added that jar file to the classpath and that part worked. I was using spark shell so I have to use spark-submit for it to be able to interact with map-reduce job. BTW when I use the command line utility ImportTsv to load a file into Hbase with the following table format descri

Re: Loading data into Hbase table throws NoClassDefFoundError: org/apache/htrace/Trace error

2016-10-02 Thread Benjamin Kim
We installed Apache Spark 1.6.0 at the time alongside CDH 5.4.8 because Cloudera only had Spark 1.3.0 at the time, and we wanted to use Spark 1.6.0’s features. We borrowed the /etc/spark/conf/spark-env.sh file that Cloudera generated because it was customized to add jars first from paths listed

Re: Loading data into Hbase table throws NoClassDefFoundError: org/apache/htrace/Trace error

2016-10-02 Thread Mich Talebzadeh
Thanks Ben The thing is I am using Spark 2 and no stack from CDH! Is this approach to reading/writing to Hbase specific to Cloudera? Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Re: Loading data into Hbase table throws NoClassDefFoundError: org/apache/htrace/Trace error

2016-10-01 Thread Benjamin Kim
Mich, I know up until CDH 5.4 we had to add the HTrace jar to the classpath to make it work using the command below. But after upgrading to CDH 5.7, it became unnecessary. echo "/opt/cloudera/parcels/CDH/jars/htrace-core-3.2.0-incubating.jar" >> /etc/spark/conf/classpath.txt Hope this helps.

Loading data into Hbase table throws NoClassDefFoundError: org/apache/htrace/Trace error

2016-10-01 Thread Mich Talebzadeh
Trying bulk load using Hfiles in Spark as below example: import org.apache.spark._ import org.apache.spark.rdd.NewHadoopRDD import org.apache.hadoop.hbase.{HBaseConfiguration, HTableDescriptor} import org.apache.hadoop.hbase.client.HBaseAdmin import org.apache.hadoop.hbase.mapreduce.TableInputForm